> On June 3, 2013, 5:36 p.m., Ben Mahler wrote:
> > What differences were you seeing between job.pendingMaps() and this new 
> > technique?
> > 
> > Looking at JobInProgress.pendingMaps():
> > 
> >   public synchronized int pendingMaps() {
> >     return numMapTasks - runningMapTasks - failedMapTIPs - 
> >     finishedMapTasks + speculativeMapTasks;
> >   }
> > 
> > vs. jobdetails_jsp.printTaskSummary() {
> >    int totalTasks = tasks.length;
> >     int runningTasks = 0;
> >     int finishedTasks = 0;
> >     int killedTasks = 0;
> >     int failedTaskAttempts = 0;
> >     int killedTaskAttempts = 0;
> >     for(int i=0; i < totalTasks; ++i) {
> >       TaskInProgress task = tasks[i];
> >       if (task.isComplete()) {
> >         finishedTasks += 1;
> >       } else if (task.isRunning()) {
> >         runningTasks += 1;
> >       } else if (task.wasKilled()) {
> >         killedTasks += 1;
> >       }
> >       failedTaskAttempts += task.numTaskFailures();
> >       killedTaskAttempts += task.numKilledTasks();
> >     }
> >     int pendingTasks = totalTasks - runningTasks - killedTasks - 
> > finishedTasks; 
> >     ...
> > }
> > 
> > It seems like the difference here might be between failed_ vs. killed_ 
> > and/or the fact that the latter case uses speculativeMapTasks?
> 
> Brenden Matthews wrote:
>     Yes, possibly.  To be honest I don't remember, it's been months since I 
> fixed this.  All I remember specifically is that it was wrong.
> 
> Ben Mahler wrote:
>     Ah, yes that's what I'm getting at, wrong in what way?
> 
> Ben Mahler wrote:
>     Hey Brenden, I would really like to get this change in as I trust that 
> you've found an issue with the code. But for posterity, it would be nice to 
> have a clearer explanation of what was wrong with the old code (so if we run 
> into bugs again we can understand why we choose to do this instead of just 
> call pendingMaps/Reduces). Perhaps we need a different term for "pending", 
> i.e., we're looking for tasks that don't have a slot open to run on.
>     
>     Do you remember how you noticed there was a problem? Did you notice in 
> production? Was it because pending was 0, despite there being map tasks that 
> needed to be run? Was the symptom that there were tasks that were not running 
> because pending was 0?
> 
> Brenden Matthews wrote:
>     Hi Ben,
>     
>     As I recall, I was finding that the values Mesos was reporting in the log 
> did not match the Hadoop JobTracker web UI.  In fact, on many occasions the 
> values for the pending tasks were negative.  Since it's not possible to have 
> a negative count, it was obviously broken.
>     
>     I thought perhaps it was related to this issue:
>     
>     
> https://mail-archives.apache.org/mod_mbox/hadoop-common-commits/201204.mbox/%[email protected]%3E
>     
>     But the build of Hadoop I was running had this patch.  I decided that 
> rather than patching Hadoop, I should just use the same calculation that the 
> web UI was using.

Perfect! Can you mention that negative values were observed, and reference 
https://issues.apache.org/jira/browse/MAPREDUCE-1238? It would be great to 
mention that you've seen discrepancies with getPendingMaps/Reduces and what the 
webUI reports as pending.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11116/#review21326
-----------------------------------------------------------


On June 6, 2013, 2:08 a.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11116/
> -----------------------------------------------------------
> 
> (Updated June 6, 2013, 2:08 a.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> Fix TaskTracker pending tasks calculation.
> 
> Review: https://reviews.apache.org/r/11116
> 
> 
> Diffs
> -----
> 
>   hadoop/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java 
> afe401f5265e3d9494af7eace42eec45943184a3 
> 
> Diff: https://reviews.apache.org/r/11116/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>

Reply via email to