[ 
https://issues.apache.org/jira/browse/HADOOP-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Runping Qi updated HADOOP-1290:
-------------------------------

    Attachment: patch-1284.txt


This patch implemented the proposed protocol.

With this patch, the streaming user can specify a field separatot for the 
mapper's output and/or a field separator 
for the reducer's output. The default will be the tab char.

The user can also specify how many fields in the output consitute the keys. The 
default is 1.
The rest part of a line will be the value.

A partitioner class, KeyFieldBasedPartitioner in mapred.lib, is also 
implemented. 
The user can specify the number of the fields in the map output keys 
will be used for partitioning.

Also a urility class, FieldSelectionMapReduce in mapred.lib, is added. This 
class allows the
user to create  map/reduce jobs that manapulate text data like the Unix cut 
utility.
The user can specify field separator (delimiter for cut) and specify which 
fields to select, and 
by which fields to partition/sort.

Two unit tests are introduced.
All the unit tests passed.


> Move Hadoop Abacus to hadoop.mapred.lib
> ---------------------------------------
>
>                 Key: HADOOP-1290
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1290
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
>
> Owen and I discussed this issue and we both felt that it is appropriate to 
> move Hadoop Abacus to the hadoop main framework.
> Any comments/thoughts/concerns/objections?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to