Hello,

We're using Hadoop in a C-oriented architecture ourselves, using libhdfs for
storing files and Hadoop.Pipes for map/reduce jobs. Since the data we're
storing benefits a lot from compression, we're currently investigating ways
to do this.

Ideally we would perform block-level compression: each separate 64MB block
of data would be compressed. Hadoop.Pipes seems to provide a way to change
the InputReader and OutputReader to enable the GzipCodec, however, I did not
find a good way to tell libhdfs to store files compressed.

Anyone has any experience with this, and/or ideas how to best approach this
problem?

We're using Hadoop 0.20.2

Regards,

Leon Mergen

Reply via email to