Hi, I would like to bundle a binary with a hadoop job and call it from inside the mappers/reducers.
The binary is a C++ program that I do not want to re-implement in Java. I want to fork it as a subprocess from inside mappers/reducers and capture the output (on stdout). So, I need to get the binary onto the compute nodes and figure out how to call it. Ideally, the binary would be copied to the compute nodes alongside the job jar. (I'm not interested in solutions that involve copying the binary to the compute nodes by hand). Note that Streaming is not a solution here--the binary itself is not the mapper or reducer; the binary needs to be *called* from the mapper/reducer. Does anyone have experience with this? Any suggestions are much appreciated! -daren
