[ http://issues.apache.org/jira/browse/HADOOP-728?page=comments#action_12452419 ] Sanjay Dahiya commented on HADOOP-728: --------------------------------------
planning to make following changes in streaming for this - 1. Use PhasedFileSystem for mapoutput in case of reducer -NONE. Its currently package protected I have made it public for streaming to be able to use it. This enables maps to generate multiple files with side effect and avoids duplicate functionality. 2. Currently in case of reducer -NONE, the output of maps is written explicitly to a DFS file, so all map tasks try to write to the same file in DFS. (@see PipeMapRed.java:264) causing this problem. This part is changed to treat <-output> as directory and map output goes in this. Each map output file name includes the task id to avoid conflict, PhasedFileSystem now takes care of speculative maps trying to write to same DFS file. Comments? > Map-reduce task does not produce correct results when -reducer NONE is > specified through streaming > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-728 > URL: http://issues.apache.org/jira/browse/HADOOP-728 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming > Reporter: dhruba borthakur > Assigned To: Sanjay Dahiya > > a) a file is create for the output instead of a directory. > b) there is no way to understand what is going on from the client output > I can produce an example for you, if you like -- but the behavior is > consistent, so $HSTREAM -mapper /bin/cat -reducer NONE should show the problem > ~ -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira