[
https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612987#action_12612987
]
Chris Douglas commented on HADOOP-3702:
---------------------------------------
Very basic question: what are the semantics of a mapper in this chain calling
collect(K,V)? Currently, it is guaranteed that neither the key nor the value
will be modified, so the following must hold:
{code}
key.set(some_value);
value.set(some_other_value);
collect(key, value);
assert key.get().equals(some_value);
assert value.get().equals(some_other_value);
{code}
Chaining mappers can violate this property unless the following maps guarantee
(by convention, presumably) that they will not modify either argument. It might
make sense to require chained mappers (excluding the final mapper) to implement
a different interface- even if that interface is empty- to promise to treat the
record as const.
> add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
> -------------------------------------------------------------------------
>
> Key: HADOOP-3702
> URL: https://issues.apache.org/jira/browse/HADOOP-3702
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Environment: all
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Priority: Minor
> Attachments: patch3702.txt
>
>
> On the same input, we usually need to run multiple Maps one after the other
> without no Reduce. We also have to run multiple Maps after the Reduce.
> If all pre-Reduce Maps are chained together and run as a single Map a
> significant amount of Disk I/O will be avoided.
> Similarly all post-Reduce Maps can be chained together and run in the Reduce
> phase after the Reduce.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.