What's the best way to compress a folder in hadoop?

Félix López Fri, 29 Jun 2012 00:37:04 -0700

The folder contains files with text and other folders with text files. The
text is not key/value, it's just text. Something like this:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dumm...


I'm thinking about 3 options:

First. To use Hadoop Streaming as it's proposed here
http://stackoverflow.com/questions/7153087/hadoop-compress-file-in-hdfs by
Jeff Wu

Second. To use a custom map/reduce task. Using as a map the IdentityMapper
and a custom reducer that creates the zip file, but i'm not sure if in the
reducer I'll have  info about the parent folders, maybe with a custom
mapper. Something similar to
https://github.com/flopezluis/testing-hadoop/blob/master/src/pruebas/Reduce.java

Third option is to create a new Hdfs command to zip in hadoop, but i'm not
sure whether hadoop distributes the execution, because otherwise it may
takes a long time and very cpu consuming.

Any ideas?

Thanks

What's the best way to compress a folder in hadoop?

Reply via email to