[ https://issues.apache.org/jira/browse/HADOOP-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Runping Qi updated HADOOP-1290: ------------------------------- Attachment: patch-1284.txt This patch implemented the proposed protocol. With this patch, the streaming user can specify a field separatot for the mapper's output and/or a field separator for the reducer's output. The default will be the tab char. The user can also specify how many fields in the output consitute the keys. The default is 1. The rest part of a line will be the value. A partitioner class, KeyFieldBasedPartitioner in mapred.lib, is also implemented. The user can specify the number of the fields in the map output keys will be used for partitioning. Also a urility class, FieldSelectionMapReduce in mapred.lib, is added. This class allows the user to create map/reduce jobs that manapulate text data like the Unix cut utility. The user can specify field separator (delimiter for cut) and specify which fields to select, and by which fields to partition/sort. Two unit tests are introduced. All the unit tests passed. > Move Hadoop Abacus to hadoop.mapred.lib > --------------------------------------- > > Key: HADOOP-1290 > URL: https://issues.apache.org/jira/browse/HADOOP-1290 > Project: Hadoop > Issue Type: Improvement > Reporter: Runping Qi > > Owen and I discussed this issue and we both felt that it is appropriate to > move Hadoop Abacus to the hadoop main framework. > Any comments/thoughts/concerns/objections? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.