[ https://issues.apache.org/jira/browse/HADOOP-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579027#action_12579027 ]
Hadoop QA commented on HADOOP-2806: ----------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377872/patch-2806.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1971/console This message is automatically generated. > Streaming has no way to force entire record (or null) as key > ------------------------------------------------------------ > > Key: HADOOP-2806 > URL: https://issues.apache.org/jira/browse/HADOOP-2806 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/streaming > Reporter: Marco Nicosia > Assignee: Amareshwari Sriramadasu > Priority: Minor > Fix For: 0.17.0 > > Attachments: patch-2806.txt > > > I think perhaps streaming needs a "-allkey" or "-nullkey" option? Otherwise, > I'm concerned there is a subtle streaming documentation problem. > These two docs: > http://hadoop.apache.org/core/docs/current/streaming.html > http://wiki.apache.org/hadoop/HadoopStreaming (Should be merged with above?) > ... seem to ignore that streaming, by default, splits key/value on TAB. Sure, > they mention it, but in all the simple (no separator) examples, they don't > seem to take into account that streaming may inconsistently decide whether > the whole line is the key, or just up to the first tab, should one occur. > This means that some records might be sorted differently as compared to > others based on whether or not there's a tab? > Here's a very simple pair of examples, that to the naive, should produce the > same output, but do not: > > [hod] (marco) >> run dfs -fs local -cat str-tabs > > a 1 > > b 3 > > a 4 > > > > [hod] (marco) >> run dfs -put str-tabs str-tabs > > > > [hod] (marco) >> run jar hadoop-streaming.jar -input str-tabs -output > > str-tabs.out -mapper /bin/cat -reducer /bin/cat > > [blah blah blah] > > > > [hod] (marco) >> run dfs -cat str-tabs.out/part-00000 > > a 4 > > a 1 > > b 3 > Compare to this negative-test: > > [hod] (marco) >> run dfs -fs local -cat str-notabs > > a 1 > > b 3 > > a 4 > > > > [hod] (marco) >> run dfs -put str-notabs str-notabs > > > > [hod] (marco) >> run jar hadoop-streaming.jar -input str-notabs -output > > str-notabs.out -mapper /bin/cat -reducer /bin/cat > > [blah blah blah] > > > > [hod] (marco) >> run dfs -cat str-notabs.out/part-00000 > > a 1 > > a 4 > > b 3 > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.