[ https://issues.apache.org/jira/browse/HADOOP-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649443#action_12649443 ]
Runping Qi commented on HADOOP-4623: ------------------------------------ Currently the data structure for runningMapCache (logically a map from Node->Collection<TaskInProgress>). Whenever a task is scheduled, a tip is added to this structure. Whenever a task is completed, the tip is deleted from the data structure. This data structure is currently implemented as a LinkedHashMap. That means each operation involves link manipulation and objection creation. I suspect that the performance would improve if a more efficient data structure is used. Here is an idea. Use a HashMap mapping nodes to fix sized arrays of tips. The fix size should be the number of slots per node. With this simple data structure, you need to initialize it once. Any add/delete operations be simply setting a reference in a fix sized array. No object creation is involved. Their overhead will be lower and predictable. > Running tasks are not maintained by JobInProgress if speculation is off > ----------------------------------------------------------------------- > > Key: HADOOP-4623 > URL: https://issues.apache.org/jira/browse/HADOOP-4623 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Amar Kamat > Assignee: Amar Kamat > Attachments: HADOOP-4623-v1.1.patch, HADOOP-4623-v1.2.patch > > > {{JobInProgress}} doesnt maintain any structure for running tasks if > speculation is turned _off_. {{getRunningMapCache()}} in {{JobInProgress}} > exposes the running map cache. This api returns an empty {{Map}} if > speculation turned off. > _Usage_ : > {{CapicityScheduler}} requires a list of running tasks for both speculated > and non-speculated jobs. See HADOOP-4558 to see how this issue affects > {{CapacityScheduler}}. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.