Hi all,

I'm running 0.19.2 in EC2, and running into an occasional problem with ClusterStatus.getTaskTrackers().

The call to getTaskTrackers() is being made in the job jar's main function, before the job starts running I need to control some aspects of my job, for example setting the number of reduce tasks to be exactly equal to the number of servers, which should be equal to the number of task trackers.

Every so often (currently < 5%) the call to getTaskTrackers() will return a value less than expected - e.g. 2 instead of 6. This happens even when ClusterStatus.getJobTrackerState() returns State.RUNNING.

I'm assuming the problem is that some of the task trackers are taking extra time to spin up. I saw HADOOP-5337 (https://issues.apache.org/jira/browse/HADOOP-5337 ), which seems related, though that's for restarts vs. initial startup.

Given that the JobTracker waits for slaves to self-report, there doesn't seem to be a totally reliable, automatic solution to this issue, but I thought I'd ask to see if there's something I'm missing.

Thanks,

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to