[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M/RM]

Chris Douglas (JIRA) Mon, 28 Jul 2008 18:09:25 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617633#action_12617633
 ]


Chris Douglas commented on HADOOP-3702:
---------------------------------------

* Instead of adding WritableUtils::asString, it might make more sense to use 
the o.a.h.io.Stringifier interfaces, particularly since you create a new 
JobConf (permitting a user to pass in an object would be an excellent, but not 
strictly necessary, addition to Stringifier, too)
* I'm not sure I understand how the configuration deserialization works. In 
Chain::getChainElementConf, a new config is created, its fields cleared and 
populated from the stream, then each property defined in the deserialized conf 
is (re)defined on a clone of the JobConf passed in (presumably to permit final, 
etc. to be observed). Is that accurate?
* Is it true that each call to {{map}} creates a series of 
{{ChainOutputCollectors}}? These look like lightweight objects, but is there a 
reason the pipeline needs to be recreated each time?
* It looks like this broke the eclipse plugin:
{panel}
     [exec] compile:
     [exec]      [echo] contrib: eclipse-plugin
     [exec]     [javac] Compiling 30 source files to 
/zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/build/contrib/eclipse-plugin/classes
     [exec]     [javac] 
/zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/server/HadoopServer.java:422:
 write(java.io.DataOutput) in org.apache.hadoop.conf.Configuration cannot be 
applied to (java.io.FileOutputStream)
     [exec]     [javac]     this.conf.write(fos);
     [exec]     [javac]              ^
     [exec]     [javac] 
/zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/servers/RunOnHadoopWizard.java:166:
 write(java.io.DataOutput) in org.apache.hadoop.conf.Configuration cannot be 
applied to (java.io.FileOutputStream)
     [exec]     [javac]       conf.write(fos);
     [exec]     [javac]           ^
{panel}

> add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3702
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Minor
>         Attachments: patch3702.txt, patch3702.txt, patch3702.txt, 
> patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, 
> patch3702.txt
>
>
> On the same input, we usually need to run multiple Maps one after the other 
> without no Reduce. We also have to run multiple Maps after the Reduce.
> If all pre-Reduce Maps are chained together and run as a single Map a 
> significant amount of Disk I/O will be avoided. 
> Similarly all post-Reduce Maps can be chained together and run in the Reduce 
> phase after the Reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*]

Reply via email to

[jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M/RM]