[jira] Resolved: (HADOOP-4913) When using the Hadoop streaming jar if the reduce job outputs only a value (no key) the code incorrectly outputs the value along with the tab character (key/value) separator.

Johan Oskarsson (JIRA) Tue, 19 May 2009 09:33:08 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Johan Oskarsson resolved HADOOP-4913.
-------------------------------------

       Resolution: Won't Fix
    Fix Version/s:     (was: site)

You can do this in user code by implementing an output format that ignores the 
key and only saves the value. Have a look at TextOutputFormat for guidance.

> When using the Hadoop streaming jar if the reduce job outputs only a value 
> (no key) the code incorrectly outputs the value along with the tab character 
> (key/value) separator.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4913
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4913
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.18.2
>         Environment: Red Hat Linux 5.
>            Reporter: John Fisher
>            Priority: Minor
>
> I would like the output of my streaming job to only be the value, omitting 
> the key and key/value separator.  However, when only printing the value I am 
> noticing that each line is ending with a tab character.  I believe I have 
> tracked down the issue (described below) but I'm not 100% sure.  The fix is 
> working for me though so I figured maybe it should be incorporated into the 
> code base.
> The tab gets printed out because of a bad check in the TextOutputFormat code. 
>  It checks if the "key" and "value" objects are null.  If they are both not 
> null, then that means that the line should be printed as 
> <key><separator><value>, otherwise it should only print the key or value, 
> depending on what is defined.  The bug is that the key and value are always 
> defined.  I traced up further to see if the error was that these objects were 
> defined when they shouldn't be, but it looks like that's how it should work.  
> I changed the Hadoop code to look for a null object and also an empty string 
> length.
> *** Patch code begin ***
> if( ! nullKey ) {
>   nullKey = ( key.toString().length() == 0 );
> }
> if( ! nullValue ) {
>   nullValue = ( value.toString().length() == 0 );
> }
> *** Patch code end ***
> The OutputCollector calls the TextOutputFormat,write() method with whatever 
> objects are passed into it (see ReduceTask.java, line 300) so that is fine.
> But above that if you look at PipeMapRed.java, in the run() method you will 
> see that the code creates a new key and value object and then starts reading 
> lines and feeding them to the OutputCollector.  This is why the key and value 
> are always defined by the time they hit the TextOutputFormat,write() and why 
> we always see the tab.
> Thanks,
> John

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-4913) When using the Hadoop streaming jar if the reduce job outputs only a value (no key) the code incorrectly outputs the value along with the tab character (key/value) separator.

Reply via email to