[ https://issues.apache.org/jira/browse/HADOOP-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johan Oskarsson resolved HADOOP-4913. ------------------------------------- Resolution: Won't Fix Fix Version/s: (was: site) You can do this in user code by implementing an output format that ignores the key and only saves the value. Have a look at TextOutputFormat for guidance. > When using the Hadoop streaming jar if the reduce job outputs only a value > (no key) the code incorrectly outputs the value along with the tab character > (key/value) separator. > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-4913 > URL: https://issues.apache.org/jira/browse/HADOOP-4913 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/streaming > Affects Versions: 0.18.2 > Environment: Red Hat Linux 5. > Reporter: John Fisher > Priority: Minor > > I would like the output of my streaming job to only be the value, omitting > the key and key/value separator. However, when only printing the value I am > noticing that each line is ending with a tab character. I believe I have > tracked down the issue (described below) but I'm not 100% sure. The fix is > working for me though so I figured maybe it should be incorporated into the > code base. > The tab gets printed out because of a bad check in the TextOutputFormat code. > It checks if the "key" and "value" objects are null. If they are both not > null, then that means that the line should be printed as > <key><separator><value>, otherwise it should only print the key or value, > depending on what is defined. The bug is that the key and value are always > defined. I traced up further to see if the error was that these objects were > defined when they shouldn't be, but it looks like that's how it should work. > I changed the Hadoop code to look for a null object and also an empty string > length. > *** Patch code begin *** > if( ! nullKey ) { > nullKey = ( key.toString().length() == 0 ); > } > if( ! nullValue ) { > nullValue = ( value.toString().length() == 0 ); > } > *** Patch code end *** > The OutputCollector calls the TextOutputFormat,write() method with whatever > objects are passed into it (see ReduceTask.java, line 300) so that is fine. > But above that if you look at PipeMapRed.java, in the run() method you will > see that the code creates a new key and value object and then starts reading > lines and feeding them to the OutputCollector. This is why the key and value > are always defined by the time they hit the TextOutputFormat,write() and why > we always see the tab. > Thanks, > John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.