I have a question about file compression in Hadoop. When I set the io.seqfile.compression.type=BLOCK does this also compress actual files I load in the DFS or does this only control the map/reduce file compression? If it doesnt compress the files on the file system, is there any way to compress a file when its loaded? The concern here is that I am just getting started with Pig/Hadoop and have a very small cluster of around 5 nodes. I want to limit IO wait by compressing the actual data. As a test when I compressed our 4GB log file using rar it was only 280mb.
Thanks, Michael
