I'm running the mean shift canopy driver over a pretty slow VPN connection and 
it appears to be resubmitting the job.jar for each iteration. When I run

./bin/mahout org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver 
-Dmapred.reduce.tasks=3 -i syntheticControl -o output -ic true -ow -x 10 -dm 
org.apache.mahout.common.distance.EuclideanDistanceMeasure -cd 0.0001 -t1 47.6 
-t2 1 -cl

... it prints out the first iteration citation to the transcript immediately, 
then delays for a minute or two to upload the jar, then runs the iteration, 
then displays the next iteration citation immediately and delays for each 
iteration. It looks to me like bin/mahout is running the driver locally, and 
each job submission from it is getting invoked remotely on the cluster. On the 
fast network in the office I never noticed this before. Is this typical?

Reply via email to