[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871959#action_12871959
 ] 

Ramkumar Vadali commented on MAPREDUCE-1819:
--------------------------------------------

This becomes important when the some of the parity files have been compacted 
into HAR archives. RaidNode.selectFiles becomes very slow in such cases because 
it repeatedly has to get data blocks for the HAR archive just to read metadata.

> RaidNode should submit one job per Raid policy
> ----------------------------------------------
>
>                 Key: MAPREDUCE-1819
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1819
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>    Affects Versions: 0.20.1
>            Reporter: Ramkumar Vadali
>
> The RaidNode currently computes parity files as follows:
> 1. Using RaidNode.selectFiles() to figure out what files to raid for a policy
> 2. Using #1 repeatedly for each configured policy to accumulate a list of 
> files. 
> 3. Submitting a mapreduce job with the list of files from #2 using 
> DistRaid.doDistRaid()
> This task addresses the fact that #2 and #3 happen sequentially. The proposal 
> is to submit a separate mapreduce job for the list of files for each policy 
> and use another thread to track the progress of the submitted jobs. This will 
> help reduce the time taken for files to be raided.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to