[
https://issues.apache.org/jira/browse/HADOOP-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Johan Oskarsson resolved HADOOP-4913.
-------------------------------------
Resolution: Won't Fix
Fix Version/s: (was: site)
You can do this in user code by implementing an output format that ignores the
key and only saves the value. Have a look at TextOutputFormat for guidance.
> When using the Hadoop streaming jar if the reduce job outputs only a value
> (no key) the code incorrectly outputs the value along with the tab character
> (key/value) separator.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4913
> URL: https://issues.apache.org/jira/browse/HADOOP-4913
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.18.2
> Environment: Red Hat Linux 5.
> Reporter: John Fisher
> Priority: Minor
>
> I would like the output of my streaming job to only be the value, omitting
> the key and key/value separator. However, when only printing the value I am
> noticing that each line is ending with a tab character. I believe I have
> tracked down the issue (described below) but I'm not 100% sure. The fix is
> working for me though so I figured maybe it should be incorporated into the
> code base.
> The tab gets printed out because of a bad check in the TextOutputFormat code.
> It checks if the "key" and "value" objects are null. If they are both not
> null, then that means that the line should be printed as
> <key><separator><value>, otherwise it should only print the key or value,
> depending on what is defined. The bug is that the key and value are always
> defined. I traced up further to see if the error was that these objects were
> defined when they shouldn't be, but it looks like that's how it should work.
> I changed the Hadoop code to look for a null object and also an empty string
> length.
> *** Patch code begin ***
> if( ! nullKey ) {
> nullKey = ( key.toString().length() == 0 );
> }
> if( ! nullValue ) {
> nullValue = ( value.toString().length() == 0 );
> }
> *** Patch code end ***
> The OutputCollector calls the TextOutputFormat,write() method with whatever
> objects are passed into it (see ReduceTask.java, line 300) so that is fine.
> But above that if you look at PipeMapRed.java, in the run() method you will
> see that the code creates a new key and value object and then starts reading
> lines and feeding them to the OutputCollector. This is why the key and value
> are always defined by the time they hit the TextOutputFormat,write() and why
> we always see the tab.
> Thanks,
> John
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.