Actually hadoop jar ... is doing the same thing so this is unrelated to Mahout.
-----Original Message----- From: Jeff Eastman [mailto:[email protected]] Sent: Friday, July 22, 2011 3:14 PM To: [email protected] Subject: Odd Behavior I'm running the mean shift canopy driver over a pretty slow VPN connection and it appears to be resubmitting the job.jar for each iteration. When I run ./bin/mahout org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver -Dmapred.reduce.tasks=3 -i syntheticControl -o output -ic true -ow -x 10 -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -cd 0.0001 -t1 47.6 -t2 1 -cl ... it prints out the first iteration citation to the transcript immediately, then delays for a minute or two to upload the jar, then runs the iteration, then displays the next iteration citation immediately and delays for each iteration. It looks to me like bin/mahout is running the driver locally, and each job submission from it is getting invoked remotely on the cluster. On the fast network in the office I never noticed this before. Is this typical?
