[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915581#action_12915581
 ] 

Scott Chen commented on MAPREDUCE-1819:
---------------------------------------

It seems to me that the intention in the following part in RaidNode.java is not 
very clear.

{code}
            int runningJobsCount = jobMonitor.runningJobsCount(info.getName());
            // Is there a scan in progress for this policy?
            if (scanState.containsKey(info.getName())) {
              // If there is a scan in progress for this policy, we can have
              // upto maxJobsPerPolicy running jobs.
              if (runningJobsCount >= maxJobsPerPolicy) {
                continue;
              }
            } else {
              // If there isn't a scan in progress for this policy, we don't
              // want to start a fresh scan if there is even one running job.
              if (runningJobsCount >= 1) {
                continue;
              }
            }
{code}

Also the logic which checks the period of policy is inside selectFiles().
Maybe we can put it a method (something like shouldProcessPolicy()) along with 
the above logic.



> RaidNode should be smarter in submitting Raid jobs
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1819
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1819
>             Project: Hadoop Map/Reduce
>          Issue Type: Task
>          Components: contrib/raid
>    Affects Versions: 0.20.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>         Attachments: MAPREDUCE-1819.patch, MAPREDUCE-1819.patch.2
>
>
> The RaidNode currently computes parity files as follows:
> 1. Using RaidNode.selectFiles() to figure out what files to raid for a policy
> 2. Using #1 repeatedly for each configured policy to accumulate a list of 
> files. 
> 3. Submitting a mapreduce job with the list of files from #2 using 
> DistRaid.doDistRaid()
> This task addresses the fact that #2 and #3 happen sequentially. The proposal 
> is to submit a separate mapreduce job for the list of files for each policy 
> and use another thread to track the progress of the submitted jobs. This will 
> help reduce the time taken for files to be raided.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to