Looks like I can use -cacheFile or DistributedCache.addCacheFile() to send 
read-only files (on HDFS) to mappers and reducers.  This is particularly useful 
for streaming mappers and reducers.

Question: in the streaming case, can a similar mechanism be used to send data 
back, or is stdout the only option?  I would like to send an HDFS file refrence 
to my streaming native code, have the code process it, produce a new file, and 
send *that* reference back as the emitted key/value for the reducer instead of 
serializing the file over stdout.  These are binary files for one thing and 
while I realize streams have evolved to accept binary IO, I am curious about 
the file-ref-passing approach as well.

Thanks.

________________________________________________________________________________
Keith Wiley     [email protected]     keithwiley.com    music.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
                                           --  Homer Simpson
________________________________________________________________________________

Reply via email to