Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by DougCutting: http://wiki.apache.org/lucene-hadoop/HadoopStreaming The comment on the change is: initial page on HadoopStreaming New page: {{{ % bin/hadoopStreaming Usage: hadoopStreaming [options] Options: -input <path> DFS input file(s) for the Map step -output <path> DFS output directory for the Reduce step -mapper <cmd> The streaming command to run -reducer <cmd> The streaming command to run -files <file> Additional files to be shipped in the Job jar file -cluster <name> Default uses hadoop-default.xml and hadoop-site.xml -config <file> Optional. One or more paths to xml config files -inputreader <spec> Optional. See below -verbose In -input: globbing on <path> is supported and can have multiple -input Default Map input format: a line is a record in UTF-8 the key part ends at first TAB, the rest of the line is the value Custom Map input format: -inputreader package.MyRecordReader,n=v,n=v comma-separated name-values can be specified to configure the InputFormat Ex: -inputreader 'StreamXmlRecordReader,begin=<doc>,end=</doc>' Map output format, reduce input/output format: Format defined by what mapper command outputs. Line-oriented Mapper and Reducer <cmd> syntax: If the mapper or reducer programs are prefixed with noship: then the paths are assumed to be valid absolute paths on the task tracker machines and are NOT packaged with the Job jar file. Use -cluster <name> to switch between "local" Hadoop and one or more remote Hadoop clusters. The default is to use the normal hadoop-default.xml and hadoop-site.xml Else configuration will use $HADOOP_HOME/conf/hadoop-<name>.xml Example: hadoopStreaming -mapper "noship:/usr/local/bin/perl5 filter.pl" -files /local/filter.pl -input "/logs/0604*/*" [...] Ships a script, invokes the non-shipped perl interpreter Shipped files go to the working directory so filter.pl is found by perl Input files are all the daily logs for days in month 2006-04 }}}