Hi Aayush,
Do you want one map to run one command? You can give input file consisting of lines of <file> <outputfile>. Use NLineInputFormat which splits N lines of input as one split. i.e gives N lines to one map for processing. By default, N is one. Then your map can just run the shell command on input line. Will this optimize your need?
More details @
http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html
Thanks,
Amareshwari
Aayush Garg wrote:
Hi,

I am having a 5 node cluster for hadoop usage. All nodes are multi-core.
I am running a shell command in Map function of my program and this shell
command takes one file as an input. Many of such files are copied in the
HDFS.

So in summary map function will run a command like ./run <file1>
<outputfile1>

Could you please suggest the optimized way to do this..like if I can use
multi core processing of nodes and many of such maps in parallel.

Thanks,
Aayush


Reply via email to