Author: harsh Date: Fri Mar 9 21:07:33 2012 New Revision: 1299045 URL: http://svn.apache.org/viewvc?rev=1299045&view=rev Log: MAPREDUCE-3991. Streaming FAQ has some wrong instructions about input files splitting. (harsh)
Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/streaming.xml Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt?rev=1299045&r1=1299044&r2=1299045&view=diff ============================================================================== --- hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt (original) +++ hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Fri Mar 9 21:07:33 2012 @@ -113,6 +113,8 @@ Release 0.23.3 - UNRELEASED MAPREDUCE-3885. Avoid an unnecessary copy for all requests/responses in MRs ProtoOverHadoopRpcEngine. (Devaraj Das via sseth) + MAPREDUCE-3991. Streaming FAQ has some wrong instructions about input files splitting. (harsh) + OPTIMIZATIONS BUG FIXES Modified: hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/streaming.xml URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/streaming.xml?rev=1299045&r1=1299044&r2=1299045&view=diff ============================================================================== --- hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/streaming.xml (original) +++ hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/streaming.xml Fri Mar 9 21:07:33 2012 @@ -750,7 +750,7 @@ You can use Hadoop Streaming to do this. As an example, consider the problem of zipping (compressing) a set of files across the hadoop cluster. You can achieve this using either of these methods: </p><ol> <li> Hadoop Streaming and custom mapper script:<ul> - <li> Generate a file containing the full HDFS path of the input files. Each map task would get one file name as input.</li> + <li> Generate files listing the full HDFS paths of the files to be processed. Each list file is the input for an individual map task which processes the files listed.</li> <li> Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory</li> </ul></li> <li>The existing Hadoop Framework:<ul>