[ https://issues.apache.org/jira/browse/HADOOP-4182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633982#action_12633982 ]
Abdul Qadeer commented on HADOOP-4182: -------------------------------------- I updated the wiki documentation of the page http://wiki.apache.org/hadoop/HadoopStreaming?action=diff as follows. Line 28: -Default Map input format: a line is a record in UTF-8 - the key part ends at first TAB, the rest of the line is the value +Default Map input format: a line is a record in UTF-8. Every line must end + with an 'end of line' delimiter. The key part ends at first TAB, the rest + of the line is the value > Streaming Documentation Update > ------------------------------ > > Key: HADOOP-4182 > URL: https://issues.apache.org/jira/browse/HADOOP-4182 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/streaming > Affects Versions: 0.19.0 > Reporter: Abdul Qadeer > Priority: Minor > Fix For: 0.19.0 > > > When Text input data is used with streaming, every line is expected to end > with a newline. Hadoop results are undefined if input files do not end in a > newline. (The results will depend on how files are assigned to mappers.) > Example: > In streaming if > mapper = xargs cat > reducer = cat > and the input is a two line, where each line is symbolic link in HDFS > link1\n > link2\n > EOF > link1 points to a file which contains > This is line1EOF > link2 points to a file which contains > This is line2EOF > Now running a streaming job such that, there is only one split, will produce > results: > This is line1This is line2\t\n > But if there were two splits, the result will be > This is line1\t\n > This is line2\t\n > So in summary, the output depends on the factor that how many mappers were > invoked. As a caution, it should be recorded in Streaming wiki that users > always put a new line at the end of each line to get away with such problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.