[ 
https://issues.apache.org/jira/browse/HADOOP-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781810#action_12781810
 ] 

Tatu Saloranta commented on HADOOP-4874:
----------------------------------------

Actually, I only now had time to spend on this: and ended up testing LZF 
(http://oldhome.schmorp.de/marc/liblzf.html), ported by H2 team 
(http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/).
Turns out LZF is pretty good at speed, although one has to be careful with 
choosing good buffer sizes, hash table size, and ideally reuse buffers too if 
possible. If so, it can be bit faster on decompression, and a lot faster on 
compression.
Numbers I saw (this is just initial testing) indicated up to twice as fast 
compression, and maybe 30% faster decompress.
Compression ratio is not as good; whereas gzip would give raties of 81/93/97% 
(for content size of 2k/20k/200k), LZF would give 66/72/72% (ie. compresses 
down to 34/28/28% of original). Which is still pretty good of course.
These with JSON data.

LZF is block-based algorithm just like all others, including gzip, and is about 
as easy to wrap in input/output streams.

I hope to find time to actually wrap existing code into bit better packaging 
(wrt buffer reuse and other optimizations). If so, it could be a reusable 
component. That may take some time, but in the meantime, source link above 
allows others to try out code as well if they want to.


> Remove bindings to lzo
> ----------------------
>
>                 Key: HADOOP-4874
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4874
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: h4874.patch
>
>
> It looks like the lzo bindings are infected by lzo's GPL and must be removed 
> from Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to