[jira] Commented: (HADOOP-2806) Streaming has no way to force entire record (or null) as key

Hadoop QA (JIRA) Sat, 15 Mar 2008 03:48:07 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579027#action_12579027
 ]


Hadoop QA commented on HADOOP-2806:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12377872/patch-2806.txt
against trunk revision 619744.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified 
tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler 
warnings.

    release audit +1.  The applied patch does not generate any new release 
audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/console

This message is automatically generated.

> Streaming has no way to force entire record (or null) as key
> ------------------------------------------------------------
>
>                 Key: HADOOP-2806
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2806
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Marco Nicosia
>            Assignee: Amareshwari Sriramadasu
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: patch-2806.txt
>
>
> I think perhaps streaming needs a "-allkey" or "-nullkey" option? Otherwise, 
> I'm concerned there is a subtle streaming documentation problem.
> These two docs:
> http://hadoop.apache.org/core/docs/current/streaming.html
> http://wiki.apache.org/hadoop/HadoopStreaming (Should be merged with above?)
> ... seem to ignore that streaming, by default, splits key/value on TAB. Sure, 
> they mention it, but in all the simple (no separator) examples, they don't 
> seem to take into account that streaming may inconsistently decide whether 
> the whole line is the key, or just up to the first tab, should one occur. 
> This means that some records might be sorted differently as compared to 
> others based on whether or not there's a tab?
> Here's a very simple pair of examples, that to the naive, should produce the 
> same output, but do not:
> > [hod] (marco) >> run dfs -fs local -cat str-tabs
> > a       1
> > b       3
> > a       4
> > 
> > [hod] (marco) >> run dfs -put str-tabs str-tabs
> > 
> > [hod] (marco) >> run jar hadoop-streaming.jar -input str-tabs -output 
> > str-tabs.out -mapper /bin/cat -reducer /bin/cat     
> > [blah blah blah]
> > 
> > [hod] (marco) >> run dfs -cat str-tabs.out/part-00000
> > a       4
> > a       1
> > b       3
> Compare to this negative-test:
> > [hod] (marco) >> run dfs -fs local -cat str-notabs
> > a 1
> > b 3
> > a 4
> > 
> > [hod] (marco) >> run dfs -put str-notabs str-notabs
> > 
> > [hod] (marco) >> run jar hadoop-streaming.jar -input str-notabs -output 
> > str-notabs.out -mapper /bin/cat -reducer /bin/cat
> > [blah blah blah]
> > 
> > [hod] (marco) >> run dfs -cat str-notabs.out/part-00000
> > a 1
> > a 4
> > b 3
> > 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2806) Streaming has no way to force entire record (or null) as key

Reply via email to