[ 
https://issues.apache.org/jira/browse/HADOOP-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701570#action_12701570
 ] 

Khaled Elmeleegy commented on HADOOP-5632:
------------------------------------------



Coarse grained locking is only part of the problem. The second part and
likely a more important part is that the heartbeat handler is heavy weight.
The handler runtime was 3+ ms for my workload. For other workloads it can go
higher.

If we can bring down the handler's runtime to 500 us (which is well within
reach), and assume we have 4000 nodes cluster with 10 slots per node and
each map runs for 20 s. That would result in 2000 heartbeats per second. The
JT can keep up with that even with locking left unchanged. Obviously, we'll
need to fix the locking in the future, but I think that lighter weight
heartbeats can be a quicker fix.



> Jobtracker leaves tasktrackers underutilized
> --------------------------------------------
>
>                 Key: HADOOP-5632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5632
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux 
> boxes, 100 node cluster
>            Reporter: Khaled Elmeleegy
>         Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf, 
> hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf, jobtracker.patch, 
> jobtracker20.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even 
> under heavy load.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to