[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M/RM]

Enis Soztutar (JIRA) Mon, 04 Aug 2008 07:43:06 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619529#action_12619529
 ]


Enis Soztutar commented on HADOOP-3702:
---------------------------------------

# Having a distinct serializer for Configuration is not desired. Configuration 
is already marshaled/unmarshaled to/from string form as XML. So having 
Configuration implement Writable is the choice here. There is no ambiguity in 
the Configuration.write() method. We can keep current write(OutputStream out) 
method and add : 
{code}
public void write(final DataOutput out) throws IOException {
    write(new OutputStream() {
      @Override
      public void write(int b) throws IOException {
        out.writeByte(b);
      }
    });
  }
{code}
The readFields() can be implemented by factoring common functionality of 
loadProperties() method, instead of reading from a URL or file, the XML will be 
built using DataInput. 
# It would be better if ChainMapper, ChainReducer, ChainOutputCollector to use 
generics, similar to Mapper, Reducer, and OutputCollector classes. 
# creating a new ChainOutputCollector() at each collect() call might be 
suboptimal. Could this not be done once for each Map/Reduce task? 

> add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3702
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Minor
>         Attachments: patch3702.txt, patch3702.txt, patch3702.txt, 
> patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, 
> patch3702.txt, patch3702.txt
>
>
> On the same input, we usually need to run multiple Maps one after the other 
> without no Reduce. We also have to run multiple Maps after the Reduce.
> If all pre-Reduce Maps are chained together and run as a single Map a 
> significant amount of Disk I/O will be avoided. 
> Similarly all post-Reduce Maps can be chained together and run in the Reduce 
> phase after the Reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*]

Reply via email to

[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M/RM]