Both TextFile and SequenceFile can be compressed or uncompressed.

TextFile means the plain text file (records delimited by "\n").
Compressed TextFiles are just text files compressed by gzip or bzip2
utility.
SequenceFile is a special file format that only Hadoop can understand.

Since your files are compressed TextFiles, you have to create a table
with TextFile format, in order to load the data without any
conversion.
(Compression is detected automatically for both TextFile and
SequenceFile - you don't need to specify it when creating a table)


Does this make the things a bit clearer?

Zheng

On Sat, Jul 25, 2009 at 3:27 AM, Saurabh Nanda<[email protected]> wrote:
>
>> If you want to load data (in compressed/uncompressed text format) into
>> a table, you have to defined the table as "stored as textfile" instead
>> of "stored as sequencefile".
>
> I'm completely confused right now. If sequencefiles are not used for
> compressed data storage then what are they used for?
>
> If I have a gz file, and I want to import it as is (without gunzipping or
> using an intermediate table), what should I be doing?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Reply via email to