[jira] [Commented] (COMPRESS-111) support for lzma files

Damjan Jovanovic (JIRA) Tue, 07 May 2013 14:03:17 -0700

    [ 
https://issues.apache.org/jira/browse/COMPRESS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651308#comment-13651308
 ]


Damjan Jovanovic commented on COMPRESS-111:
-------------------------------------------

The fundamental problem is that Commons Compress does decompression via 
CompressorInputStream’s read() methods, which are a pull-model interface, while 
the LZMA SDK (in the public domain) does it with Decoder.code(), a method that 
takes a compressed input stream and an output stream to decompress to, then 
reads, decompresses, and writes, only returning when the entire file is 
decompressed. There is no way to convert this to a pull-model 
CompressorInputStream: either you have to pull in one thread while pushing from 
another, or push everything into a ByteArrayInputStream (which needs O\(n) 
memory!!) and then pull from that afterwards. Both are really ugly solutions: 
thread per stream is heavy and creating new threads is not allowed in some 
environments (eg. unsigned Applets and Java EE servers), while trying to 
allocate O\(n) memory can OutOfMemoryError the entire JVM.

The Java LZMA attempts out there rate as follows:

Maurel’s patch here uses O\(n) memory, and decompresses the entire stream in 
the constructor and stores it in a ByteArrayInputStream which is then copied 
from on each read().

http://jponge.github.io/lzma-java/ is licensed ASLv2 and states how it solved 
the push/pull problem: “Although not a derivate work, the streaming api classes 
were inspired from the work of Christopher League. I reused his technique of 
fake streams and working threads to pass the data around between 
encoders/decoders and "normal" Java streams.” In other words, it pushes in one 
thread and pulls in another. Actual decompression in the other thread is still 
done with the LZMA SDK, which it just wraps into an InputStream subclass.

http://contrapunctus.net/league/haques/lzmajio/ was done by Christopher League, 
it’s under “LGPL or the Common Public License” and has the same push in one 
thread pull in another story. It’s also just a wrapper of the LZMA SDK.

http://tukaani.org/xz/java.html is in the public domain and is already used by 
Commons Compress to provide XZ compression support. It supports XZ and LZMA2 
only and supports them well - proper pull-model InputStream with no O\(n) 
memory or background threads. LZMA2 is a different file format from LZMA. But 
then again LZMA2 uses LZMA internally. I’ll have to investigate in detail.
                
> support for lzma files
> ----------------------
>
>                 Key: COMPRESS-111
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-111
>             Project: Commons Compress
>          Issue Type: New Feature
>          Components: Compressors
>    Affects Versions: 1.0
>            Reporter: maurel jean francois
>         Attachments: compress-trunk-lzmaRev0.patch, 
> compress-trunk-lzmaRev1.patch
>
>
> adding support for compressing and decompressing of files with LZMA algoritm 
> (Lempel-Ziv-Markov chain-Algorithm)
> (see 
> http://markmail.org/search/?q=list%3Aorg.apache.commons.users/#query:list%3Aorg.apache.commons.users%2F+page:1+mid:syn4uuvbzusevtko+state:results)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (COMPRESS-111) support for lzma files

Reply via email to