Michael,

On Tue, Nov 13, 2007 at 08:56:36AM -0800, Michael Harris wrote:
>I have a question about file compression in Hadoop. When I set the 
>io.seqfile.compression.type=BLOCK does this also compress actual files I load 
>in the DFS or does this only control the map/reduce file compression? If it 
>doesnt compress the files on the file system, is there any way to compress a 
>file when its loaded? The concern here is that I am just getting started with 
>Pig/Hadoop and have a very small cluster of around 5 nodes. I want to limit IO 
>wait by compressing the actual data. As a test when I compressed our 4GB log 
>file using rar it was only 280mb.
>

If you are loading files into HDFS as a SequenceFile and you set 
io.seqfile.compression.type=BLOCK (or RECORD) the file will have compressed 
records. Equivalently you can also use one of the many 
SequenceFile.createWriter methods (see 
http://lucene.apache.org/hadoop/api/org/apache/hadoop/io/SequenceFile.html) to 
specify the compression type, compression codec etc.

Arun

Reply via email to