One last question here. If both, TextFile and SequenceFile can be
compressed, then what's the advantage of the SequenceFile format?

Is it that a compressed file can be split into chunks only if it is stored
as a SequenceFile?

Saurabh.

On Sat, Jul 25, 2009 at 4:14 PM, Zheng Shao <[email protected]> wrote:

> Both TextFile and SequenceFile can be compressed or uncompressed.
>
> TextFile means the plain text file (records delimited by "\n").
> Compressed TextFiles are just text files compressed by gzip or bzip2
> utility.
> SequenceFile is a special file format that only Hadoop can understand.
>
> Since your files are compressed TextFiles, you have to create a table
> with TextFile format, in order to load the data without any
> conversion.
> (Compression is detected automatically for both TextFile and
> SequenceFile - you don't need to specify it when creating a table)
>
>
> Does this make the things a bit clearer?
>
> Zheng
>
> On Sat, Jul 25, 2009 at 3:27 AM, Saurabh Nanda<[email protected]>
> wrote:
> >
> >> If you want to load data (in compressed/uncompressed text format) into
> >> a table, you have to defined the table as "stored as textfile" instead
> >> of "stored as sequencefile".
> >
> > I'm completely confused right now. If sequencefiles are not used for
> > compressed data storage then what are they used for?
> >
> > If I have a gz file, and I want to import it as is (without gunzipping or
> > using an intermediate table), what should I be doing?
> >
> > Saurabh.
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to