[ https://issues.apache.org/jira/browse/HADOOP-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646169#action_12646169 ]
Vivek Ratan commented on HADOOP-4558: ------------------------------------- I'd go with #2 (yes, you need to make sure that no code is relying on the fact that the data structures for running tasks are empty if speculative execution is turned off). Granted, you're keeping extra state for jobs with spec execution turned off, but the number of running tasks cannot exceed the cluster capacity, so you're bounded. option #1 duplicates code between the Capacity Scheduler & JobInProgress, and Option #3 is expensive, though we do a linear scan only when killing tasks, which shouldn't happen very often. > Scheduler fails to reclaim capacity if Jobs are submitted to queue one after > the other > -------------------------------------------------------------------------------------- > > Key: HADOOP-4558 > URL: https://issues.apache.org/jira/browse/HADOOP-4558 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/capacity-sched > Affects Versions: 0.19.0 > Environment: Cluster Capacity Maps=Reduces =210 each > Two Queues: > Q1: default, GC (%) =40, GC=84 (Maps and Reduces each). Reclaim time = 3 > mins. > Q2: test_q1, GC (%) =60, GC=126 (Maps and Reduces each) Reclaim time = 2 mins > Reporter: Karam Singh > Attachments: 4558.1.patch > > > Scheduler fails to reclaim capacity if Jobs are submitted to queue one after > the other. > First job submitted with tasks equal to cluster's M/R Capacity > Second is submitted to different queue when all tasks of First Job are > running, scheduler fails to reclaim capacity for second job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.