> TextFile means the plain text file (records delimited by "\n"). > Compressed TextFiles are just text files compressed by gzip or bzip2 > utility. SequenceFile is a special file format that only Hadoop can > understand. > Since your files are compressed TextFiles, you have to create a table > with TextFile format, in order to load the data without any > conversion. > (Compression is detected automatically for both TextFile and > SequenceFile - you don't need to specify it when creating a table)
This really clears things up. I guess adding a note in the Wiki will put an end to the confusion permanently. A little note on the approach (compressed textfile vs compressed sequencefile) with the best performance would also be appreciated. Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
