I'll actually invoke one executable from each of my map. Because this C++ program has been implemented and used in the past, I just want to integrate it to our Hadoop map/reduce without having to re-implement the process in java. So my map is going to be very simple with just calling the process and pass the input files.
Thanks, Grace -----Original Message----- From: Arun C Murthy [mailto:a...@hortonworks.com] Sent: Tuesday, August 23, 2011 9:51 AM To: common-dev@hadoop.apache.org Subject: Re: how to pass a hdfs file to a c++ process On Aug 22, 2011, at 12:57 PM, Zhixuan Zhu wrote: > Hi All, > > I'm using hadoop-0.20.2 to try out some simple tasks. I asked a question > about FileInputFormat a few days ago and get some prompt replys from > this forum and it helped a lot. Thanks again! Now I have another > question. I'm trying to invoke a C++ process from my mapper for each > hdfs file in the input directory to achieve some parallel processing. That seems weird - why aren't you using more maps and one file per-map? > But how do I pass the file to the program? I would want to do something > like the following in my mapper: IAC, libhdfs is one way to do HDFS ops via c/c++. Arun > > Process lChldProc = Runtime.getRuntime().exec("myprocess -file > $filepath"); > > How do I pass the hdfs filesystem to an outside process like that? Is > HadoopStreaming the direction I should go? > > Thanks very much for any reply in advance. > > Best, > Grace