[ http://issues.apache.org/jira/browse/HADOOP-115?page=all ]

Doug Cutting updated HADOOP-115:
--------------------------------

        Summary: permit reduce input types to differ from reduce output types  
(was: Hadoop should allow the user to use SequentialFileOutputformat as the 
output format and to choose  key/value classes that are different from those 
for map output.)
           type: New Feature  (was: Improvement)
    Description: 
When map tasks write intermediate data out, they always use SequencialFile 
RecordWriter with key/value classes from the job object.

When the reducers write the final results out, its output format is obtained 
from the job object. By default, it is TextOutputFormat, and no conflicts.
However, if one wants to use SequencialFileFormat for the final results, then 
the key/value classes are also obtained from the job object, the same as the 
map tasks' output. Now we have a problem. It is impossible for the map outputs 
and reducer outputs use different key/value classes, if one wants the reducers 
generate outputs in SequentialFileFormat.

A simple fix would be to add another two attributes to JobConf class: 
mapOutputLeyClass and mapOutputValueClass. That allows the user to have 
different key/value classes for the intermediate and final outputs.



  was:

When map tasks write intermediate data out, they always use SequencialFile 
RecordWriter with key/value classes from the job object.

When the reducers write the final results out, its output format is obtained 
from the job object. By default, it is TextOutputFormat, and no conflicts.
However, if one wants to use SequencialFileFormat for the final results, then 
the key/value classes are also obtained from the job object, the same as the 
map tasks' output. Now we have a problem. It is impossible for the map outputs 
and reducer outputs use different key/value classes, if one wants the reducers 
generate outputs in SequentialFileFormat.

A simple fix would be to add another two attributes to JobConf class: 
mapOutputLeyClass and mapOutputValueClass. That allows the user to have 
different key/value classes for the intermediate and final outputs.




> permit reduce input types to differ from reduce output types
> ------------------------------------------------------------
>
>          Key: HADOOP-115
>          URL: http://issues.apache.org/jira/browse/HADOOP-115
>      Project: Hadoop
>         Type: New Feature

>   Components: mapred
>     Reporter: Runping Qi
>     Assignee: Runping Qi
>  Attachments: hadoop-115_ReduceTask.patch, hadoop-115_tk.patch, 
> patch_115.txt.2006_05_16
>
> When map tasks write intermediate data out, they always use SequencialFile 
> RecordWriter with key/value classes from the job object.
> When the reducers write the final results out, its output format is obtained 
> from the job object. By default, it is TextOutputFormat, and no conflicts.
> However, if one wants to use SequencialFileFormat for the final results, then 
> the key/value classes are also obtained from the job object, the same as the 
> map tasks' output. Now we have a problem. It is impossible for the map 
> outputs and reducer outputs use different key/value classes, if one wants the 
> reducers generate outputs in SequentialFileFormat.
> A simple fix would be to add another two attributes to JobConf class: 
> mapOutputLeyClass and mapOutputValueClass. That allows the user to have 
> different key/value classes for the intermediate and final outputs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to