Hi Andraz, First, thanks for the contribution. Could you create a JIRA ticket and upload the code there? Due to ASF restrictions, all contributions must be attached to a JIRA so you can officially grant permission to include the code. The JIRA will also allow others to review and comment on the code.
When you attach it to the JIRA, if you could format it as a -p0 patch against the Common repository, that would also be preferred. Check out this page for further info: http://wiki.apache.org/hadoop/HowToContribute Thanks -Todd On Tue, Jul 21, 2009 at 2:12 AM, Andraz Tori <[email protected]> wrote: > If it is useful to anyone: > here's a codec to support getting data from .tar.gz > > Basically the assumption is that instead of having just one text file > gzipped, you have many text files tared and gzipped. Therefore it just > concatenates all the files inside .tar.gz archive. > > The source was based on GzipCodec.java > It also depends on JavaTar from > http://gjt.org/pkgdoc/com/ice/tar/index.html which is released under > Public Domain. > > It passes the unit tests for codecs and we've successfully used it in > processing around a hundred gigabytes of data. > > > -- > Andraz Tori, CTO > Zemanta Ltd, New York, London, Ljubljana > www.zemanta.com > mail: [email protected] > tel: +386 41 515 767 > twitter: andraz, skype: minmax_test > > >
