On Apr 17, 2008, at 11:20 PM, Sridhar Raman wrote:

I am new to MapReduce and Hadoop, and I have managed to find my way through
with a few programs.  But I still have some doubts that are constantly
clinging onto me. I am not too sure whether these are basic doubts, or just
some documentation that I missed somewhere.

Take a look at  http://tinyurl.com/4y7776 under InputFormats.

1) Should my input _always_ be text files? What if my input is in the form
of Java objects?  Where do I handle this conversion?

You can define your own InputFormat that reads an arbitrary format, or use SequenceFileInputFormat that reads SequenceFiles. SequenceFiles are a file format defined by Hadoop to hold binary data consisting of Writable keys and values.

2) How do I control how the output is written? For example, if I want to
output in a format that is my own, how do I do it?

That is controlled by the OutputFormat. It defaults to TextOutputFormat, but you can either use SequenceFileOutputFormat or make your own.

-- Owen

Reply via email to