[
https://issues.apache.org/jira/browse/MAPREDUCE-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927245#action_12927245
]
Scott Chen commented on MAPREDUCE-1892:
---------------------------------------
{code}
+ List<PolicyInfo> allPolicies = null;
{code}
We can remove this field because it is not used.
+1 Looks good to me.
> RaidNode can allow layered policies more efficiently
> ----------------------------------------------------
>
> Key: MAPREDUCE-1892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1892
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/raid
> Reporter: Ramkumar Vadali
> Assignee: Ramkumar Vadali
> Attachments: MAPREDUCE-1892.patch
>
>
> The RaidNode policy file can have layered policies that can cover a file more
> than once. To avoid processing a file multiple times (for RAIDing), RaidNode
> maintains a list of processed files that is used to avoid duplicate
> processing attempts.
> This is problematic in that a large number of processed files could cause the
> RaidNode to run out of memory.
> This task proposes a better method of detecting processed files. The method
> is based on the observation that a more selective policy will have a better
> match with a file name than a less selective one. Specifically, the more
> selective policy will have a longer common prefix with the file name.
> So to detect if a file has already been processed, the RaidNode only needs to
> maintain a list of processed policies and compare the lengths of the common
> prefixes. If the file has a longer common prefix with one of the processed
> policies than with the current policy, it can be assumed to be processed
> already.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.