I'm running the mean shift canopy driver over a pretty slow VPN connection and it appears to be resubmitting the job.jar for each iteration. When I run
./bin/mahout org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver -Dmapred.reduce.tasks=3 -i syntheticControl -o output -ic true -ow -x 10 -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -cd 0.0001 -t1 47.6 -t2 1 -cl ... it prints out the first iteration citation to the transcript immediately, then delays for a minute or two to upload the jar, then runs the iteration, then displays the next iteration citation immediately and delays for each iteration. It looks to me like bin/mahout is running the driver locally, and each job submission from it is getting invoked remotely on the cluster. On the fast network in the office I never noticed this before. Is this typical?
