I have a question about file compression in Hadoop. When I set the 
io.seqfile.compression.type=BLOCK does this also compress actual files I load 
in the DFS or does this only control the map/reduce file compression? If it 
doesnt compress the files on the file system, is there any way to compress a 
file when its loaded? The concern here is that I am just getting started with 
Pig/Hadoop and have a very small cluster of around 5 nodes. I want to limit IO 
wait by compressing the actual data. As a test when I compressed our 4GB log 
file using rar it was only 280mb.

Thanks,
Michael

Reply via email to