Hi, On Sat, Mar 5, 2011 at 9:03 AM, maha <[email protected]> wrote: > Hi, > > I have 2 questions: > > 1) Is a  SequenceFile more efficient than TextFiles for input?  ... I think > TextFiles will be processed by TextInputFormat into sequenceFiles inside > hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ?
Depends on what your scenario is. > 2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to > the header protocol defined in > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html > ? No. You would use SequenceFileInputFormat and SequenceFileOutputFormat classes. May I suggest reading a good Hadoop book that covers the little, scattered stuff like this, neatly? I like Tom White's Hadoop: The Definitive Guide :) -- Harsh J www.harshj.com
