[
http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377480 ]
Michel Tourn commented on HADOOP-191:
-------------------------------------
The usage message:
hadoop-trunk>bin/hadoopStreaming
Usage: hadoopStreaming [options]
Options:
-input <path> DFS input file(s) for the Map step
-output <path> DFS output directory for the Reduce step
-mapper <cmd> The streaming command to run
-reducer <cmd> The streaming command to run
-files <file> Additional files to be shipped in the Job jar file
-cluster <name> Default uses hadoop-default.xml and hadoop-site.xml
-config <file> Optional. One or more paths to xml config files
-inputreader <spec> Optional. See below
-verbose
In -input: globbing on <path> is supported and can have multiple -input
Default Map input format: a line is a record in UTF-8
the key part ends at first TAB, the rest of the line is the value
Custom Map input format: -inputreader package.MyRecordReader,n=v,n=v
comma-separated name-values can be specified to configure the InputFormat
Ex: -inputreader 'StreamXmlRecordReader,begin=<doc>,end=</doc>'
Map output format, reduce input/output format:
Format defined by what mapper command outputs. Line-oriented
Mapper and Reducer <cmd> syntax:
If the mapper or reducer programs are prefixed with noship: then
the paths are assumed to be valid absolute paths on the task tracker machines
and are NOT packaged with the Job jar file.
Use -cluster <name> to switch between "local" Hadoop and one or more remote
Hadoop clusters.
The default is to use the normal hadoop-default.xml and hadoop-site.xml
Else configuration will use $HADOOP_HOME/conf/hadoop-<name>.xml
Example: hadoopStreaming -mapper "noship:/usr/local/bin/perl5 filter.pl"
-files /local/filter.pl -input "/logs/0604*/*" [...]
Ships a script, invokes the non-shipped perl interpreter
Shipped files go to the working directory so filter.pl is found by perl
Input files are all the daily logs for days in month 2006-04
> add hadoopStreaming to src/contrib
> ----------------------------------
>
> Key: HADOOP-191
> URL: http://issues.apache.org/jira/browse/HADOOP-191
> Project: Hadoop
> Type: New Feature
> Reporter: Michel Tourn
> Assignee: Doug Cutting
> Attachments: streaming.patch
>
> This is a patch that adds a src/contrib/hadoopStreaming directory to the
> source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as
> Reduce)
> TO test the patch:
> Merge the patch.
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira