Looks like I can use -cacheFile or DistributedCache.addCacheFile() to send read-only files (on HDFS) to mappers and reducers. This is particularly useful for streaming mappers and reducers.
Question: in the streaming case, can a similar mechanism be used to send data back, or is stdout the only option? I would like to send an HDFS file refrence to my streaming native code, have the code process it, produce a new file, and send *that* reference back as the emitted key/value for the reducer instead of serializing the file over stdout. These are binary files for one thing and while I realize streams have evolved to accept binary IO, I am curious about the file-ref-passing approach as well. Thanks. ________________________________________________________________________________ Keith Wiley [email protected] keithwiley.com music.keithwiley.com "And what if we picked the wrong religion? Every week, we're just making God madder and madder!" -- Homer Simpson ________________________________________________________________________________
