Hello, We're using Hadoop in a C-oriented architecture ourselves, using libhdfs for storing files and Hadoop.Pipes for map/reduce jobs. Since the data we're storing benefits a lot from compression, we're currently investigating ways to do this.
Ideally we would perform block-level compression: each separate 64MB block of data would be compressed. Hadoop.Pipes seems to provide a way to change the InputReader and OutputReader to enable the GzipCodec, however, I did not find a good way to tell libhdfs to store files compressed. Anyone has any experience with this, and/or ideas how to best approach this problem? We're using Hadoop 0.20.2 Regards, Leon Mergen
