On Apr 17, 2008, at 11:20 PM, Sridhar Raman wrote:
I am new to MapReduce and Hadoop, and I have managed to find my way
through
with a few programs. But I still have some doubts that are constantly
clinging onto me. I am not too sure whether these are basic
doubts, or just
some documentation that I missed somewhere.
Take a look at http://tinyurl.com/4y7776 under InputFormats.
1) Should my input _always_ be text files? What if my input is in
the form
of Java objects? Where do I handle this conversion?
You can define your own InputFormat that reads an arbitrary format,
or use SequenceFileInputFormat that reads SequenceFiles.
SequenceFiles are a file format defined by Hadoop to hold binary data
consisting of Writable keys and values.
2) How do I control how the output is written? For example, if I
want to
output in a format that is my own, how do I do it?
That is controlled by the OutputFormat. It defaults to
TextOutputFormat, but you can either use SequenceFileOutputFormat or
make your own.
-- Owen