One last question here. If both, TextFile and SequenceFile can be compressed, then what's the advantage of the SequenceFile format?
Is it that a compressed file can be split into chunks only if it is stored as a SequenceFile? Saurabh. On Sat, Jul 25, 2009 at 4:14 PM, Zheng Shao <[email protected]> wrote: > Both TextFile and SequenceFile can be compressed or uncompressed. > > TextFile means the plain text file (records delimited by "\n"). > Compressed TextFiles are just text files compressed by gzip or bzip2 > utility. > SequenceFile is a special file format that only Hadoop can understand. > > Since your files are compressed TextFiles, you have to create a table > with TextFile format, in order to load the data without any > conversion. > (Compression is detected automatically for both TextFile and > SequenceFile - you don't need to specify it when creating a table) > > > Does this make the things a bit clearer? > > Zheng > > On Sat, Jul 25, 2009 at 3:27 AM, Saurabh Nanda<[email protected]> > wrote: > > > >> If you want to load data (in compressed/uncompressed text format) into > >> a table, you have to defined the table as "stored as textfile" instead > >> of "stored as sequencefile". > > > > I'm completely confused right now. If sequencefiles are not used for > > compressed data storage then what are they used for? > > > > If I have a gz file, and I want to import it as is (without gunzipping or > > using an intermediate table), what should I be doing? > > > > Saurabh. > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
