[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071174#comment-14071174
 ] 

Chris Douglas commented on MAPREDUCE-5974:
------------------------------------------

bq. Doing fallback as the records are emitted would be pretty neat, but may 
also be somewhat difficult. [snip]

*nod* Fair enough, though if each MapTask is making independent decisions about 
the collector, they still need to agree on the format for the shuffle. Spilling 
one collector to disk and changing strategies should be compatible, assuming 
there isn't a different format for intermediate spills. But yeah, this is very 
abstract, given the use cases we have.

If the goal is to support a fallback collector when native libs aren't 
available; given the dependency on intermediate format, should the swap be 
internal to the native collector, even in init? If the interface were like the 
serialization, then one might use the keytype, etc. to pick the 
most-appropriate collector. As failover, I'm struggling to come up with a case 
that's not covered by making this an internal detail of the native collector.

> Allow map output collector fallback
> -----------------------------------
>
>                 Key: MAPREDUCE-5974
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: task
>    Affects Versions: 2.6.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-5974.txt
>
>
> Currently we only allow specifying a single MapOutputCollector implementation 
> class in a job. It would be nice to allow a comma-separated list of classes: 
> we should try each collector implementation in the user-specified order until 
> we find one that can be successfully instantiated and initted.
> This is useful for cases where a particular optimized collector 
> implementation cannot operate on all key/value types, or requires native 
> code. The cluster administrator can configure the cluster to try to use the 
> optimized collector and fall back to the default collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to