Hello, Forgive me if I am missing something in the documentation, but nothing is jumping out at me.
I am exploring the use of Hadoop for image analysis and/or image vectorization and have a few questions. I anticipate that there will be a large collection of image files as input with an equal number of output files. All files will be in raw binary format and are independent of each other. What I am trying to figure it is: -Does Hadoop/MR offer a clean abstraction for both consuming and producing a large number of files? (I know it can handily consume a large number of fies, but all examples of output seem to form a single file) -Does Hadoop provide the input/output formats relevant to this or would I have to create my own? (e.g non-splittable binary input, and binary output) -Is this issue even well-suited to Hadoop in the first place? This type of job may only need the map phase, and not the reduce phase, so maybe I'm looking in the wrong place. Thank you for your time. Also, I only subscribe to the digest, if you have questions for me regarding this, please cc me at [EMAIL PROTECTED] Dan
