[
https://issues.apache.org/jira/browse/MAPREDUCE-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758015#action_12758015
]
eric baldeschwieler commented on MAPREDUCE-989:
-----------------------------------------------
Philip's idea seems interesting.
In terms of specifying map vs reduce, would it just be possible to lazily load
an item when it is needed? Asking users to configure more stuff seems awkward.
In a previous system, we had a call to get the path to a cached object. If the
object was not in the cache, it was downloaded upon the first request. This
would allow objects to be used whenever needed without configuration.
Note: This would even allow one to shard cached objects into sets, if for
example different reducers need different data, which is often the case.
Downsides:
- Perhaps an API change?
- Less info that the JT has for later optimizations
> Allow segregation of DistributedCache for maps and reduces
> ----------------------------------------------------------
>
> Key: MAPREDUCE-989
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-989
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: client
> Reporter: Arun C Murthy
>
> Applications might have differing needs for files in the DistributedCache wrt
> maps and reduces. We should allow them to specify them separately.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.