[
https://issues.apache.org/jira/browse/COMPRESS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108019#comment-15108019
]
Stefan Bodewig commented on COMPRESS-291:
-----------------------------------------
It would be good to know, whether the issue is LZMA(2) or the way SevenZFile
uses it - I'm afraid it's the former. Commons Compress doesn't implement LZMA
itself but uses the XZ for Java libraray http://tukaani.org/xz/java.html . I'm
not saying this to pass the blame but rather to ensure people spend their
energy where it is needed more. Lasse Collin, the author of XZ for Java also is
the author of its C cousin.
When Commons Compress uses Deflate (the GZIP code or when ZIP or 7z use
Deflate) then {{java.util.zip.Deflater}}/{{Inflater}} are at work which are JNI
layers on top of zlib. This should better be close to the performance of zlib
:-)
Personally I've spent quite a bit of time in out bzip2 code - which is a close
port of Julian Seward's C library and can tell you that Java itself often is an
obstacle for efficient compression. The lack of unsigned types and the indirect
memory access - including bounds checks - of byte[]s produces very different
code from what you can do in C. In the LZ77 family of compressors you look for
matching sequences of bytes. In C you simply cast the {{char*}} pointing to the
raw data to an {{int*}} and compare four bytes at once (sacrificing matches
that are not aligned at four byte boundaries). Using {{sun.misc.Unsafe}} is a
frowned upon option that we've not chosen so far. LZMA is most probably even
worse since it works at the bit level rather than the byte level.
Enough of my rambling. To answer Robert's question: I'm not aware of anybody
actively looking into it.
> decompress .7z archive very very slow
> -------------------------------------
>
> Key: COMPRESS-291
> URL: https://issues.apache.org/jira/browse/COMPRESS-291
> Project: Commons Compress
> Issue Type: Improvement
> Components: Compressors
> Affects Versions: 1.9
> Environment: Windows 7 x64, jdk1.7.0_21 x64
> Reporter: Robert Jansen
> Priority: Minor
>
> I have 7z archives with one large image and many small files. The following
> code decompresses to a directory and returns the largest file. It is
> glacially slow and not usable for GB size files:
> public File unSevenZipToDir(File sevenZipFile, File outputDir) {
>
> File imgFile = null;
> // Make sure output dir exists
> outputDir.mkdirs();
> if (outputDir.exists()) {
>
> //FileInputStream stream;
> try {
>
> FileOutputStream output = null;
> SevenZFile f7z = new SevenZFile(sevenZipFile);
> SevenZArchiveEntry entry;
> long maxSize = 0;
> while ((entry = f7z.getNextEntry()) != null) {
> if (entry != null) {
> String s = entry.getName();
> if (s != null) {
> long sz =
> entry.getSize();
>
> if (sz > 0) {
> int count;
> byte data[] =
> new byte[4096];
>
> String
> outFileName = outputDir.getPath() + "/"
>
> + new File(entry.getName()).getName();
>
>
>
>
>
>
>
>
>
>
>
>
>
> File outFile =
> new File(outFileName);
>
> // Extract only
> if it does not already exist
> if
> (outFile.exists() == false) {
>
> System.out.println("Extracting " + s + " => size = " + sz);
>
>
>
>
> FileOutputStream fos = new FileOutputStream(
>
> outFile);
>
>
>
> BufferedOutputStream dest = new BufferedOutputStream(
>
> fos);
>
> while
> ((count = f7z.read(data)) != -1) {
>
> dest.write(data, 0, count);
> }
>
>
> dest.flush();
>
> dest.close();
>
> } else {
>
> System.out.println("Using already Extracted " + s + " => size = " + sz);
> }
> if
> (s.endsWith(".h5") || s.endsWith(".tif") ||
>
> s.endsWith(".cos") || s.endsWith(".nitf")
>
> || s.endsWith(".ntf")
>
> || s.endsWith(".jpg") && sz > maxSize) {
> maxSize
> = sz;
> imgFile
> = new File(outFileName);
> }
> } // end sz > 0
> } // end s != null
> } // end if entry
> } // end while
> f7z.close();
> } catch (FileNotFoundException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> } catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> }
> return imgFile;
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)