[jira] [Commented] (COMPRESS-291) decompress .7z archive very very slow

Dawid Weiss (JIRA) Tue, 19 Jan 2016 23:39:08 -0800

    [ 
https://issues.apache.org/jira/browse/COMPRESS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108171#comment-15108171
 ]


Dawid Weiss commented on COMPRESS-291:
--------------------------------------

I don't think it's lzma in my case -- the 7z archive I checked used bzip 
streams and this was the performance bottleneck. I looked at the code -- and as 
Stefan mentioned, it is a close port of the C code, but some things that work 
in C don't yield as well to Java (case switch for state transitions on every 
single byte read from stream, double int[][] indirections, etc.). Fun work to 
try to optimize this (and understand the underlying code)! 

bq. In C you simply cast the char* pointing to the raw data to an int* and 
compare four bytes at once (sacrificing matches that are not aligned at four 
byte boundaries). Using sun.misc.Unsafe is a frowned upon option that we've not 
chosen so far. LZMA is most probably even worse since it works at the bit level 
rather than the byte level.

See this thread, Stefan -- http://markmail.org/thread/zqscb5jktgodxj5p -- there 
are ways to do it efficiently (see Aleksey Shipilev's benchmark code, at the 
end of the thread), although it requires Java7.


> decompress .7z archive very very slow
> -------------------------------------
>
>                 Key: COMPRESS-291
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-291
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.9
>         Environment: Windows 7 x64, jdk1.7.0_21 x64
>            Reporter: Robert Jansen
>            Priority: Minor
>
> I have 7z archives with one large image and many small files. The following 
> code decompresses to a directory and returns the largest file. It is 
> glacially slow and not usable for GB size files:
> public File unSevenZipToDir(File sevenZipFile, File outputDir) {
>               
>               File imgFile = null;
>               // Make sure output dir exists
>               outputDir.mkdirs();
>               if (outputDir.exists()) {
>                       
>                       //FileInputStream stream;
>                       try {
>                       
>                               FileOutputStream output = null;
>                               SevenZFile f7z = new SevenZFile(sevenZipFile);
>                               SevenZArchiveEntry entry;
>                               long maxSize = 0;
>                               while ((entry = f7z.getNextEntry()) != null) {
>                                       if (entry != null) {
>                                               String s = entry.getName();
>                                               if (s != null) {
>                                                       long sz = 
> entry.getSize();
>                                                       
>                                                       if (sz > 0) {
>                                                               int count;
>                                                               byte data[] = 
> new byte[4096];
>                                                               
>                                                               String 
> outFileName = outputDir.getPath() + "/"
>                                                                               
> + new File(entry.getName()).getName(); 
>                                                                               
>                                                                               
>   
>                                                                               
>                                                                               
>   
>                                                                               
>                                                                               
>   
>                                                                               
>                                                                               
>   
>                                                                
>                                                               File outFile = 
> new File(outFileName);
>                                                               
>                                                               // Extract only 
> if it does not already exist            
>                                                               if 
> (outFile.exists() == false) {
>                                                                       
> System.out.println("Extracting " + s + " => size = " + sz);
>                                                                       
>                                                                       
>                                                                       
>                                                                       
> FileOutputStream fos = new FileOutputStream(
>                                                                               
>         outFile);
>                                                                               
>         
>                                                                       
> BufferedOutputStream dest = new BufferedOutputStream(
>                                                                               
>         fos);
>       
>                                                                       while 
> ((count = f7z.read(data)) != -1) {
>                                                                               
> dest.write(data, 0, count);
>                                                                       }
>                                   
>                                                                       
> dest.flush();
>                                                                       
> dest.close(); 
>                                                               
>                                                               } else {
>                                                                       
> System.out.println("Using already Extracted " + s + " => size = " + sz);
>                                                               }
>                                                               if 
> (s.endsWith(".h5") || s.endsWith(".tif") || 
>                                                                               
> s.endsWith(".cos") || s.endsWith(".nitf") 
>                                                                               
> || s.endsWith(".ntf")
>                                                                               
> || s.endsWith(".jpg") && sz > maxSize) {
>                                                                       maxSize 
> = sz;
>                                                                       imgFile 
> = new File(outFileName);
>                                                               }
>                                                       } // end sz > 0
>                                               } // end s != null
>                                       } // end if entry
>                               } // end while
>                               f7z.close();
>                       } catch (FileNotFoundException e) {
>                               // TODO Auto-generated catch block
>                               e.printStackTrace();
>                       } catch (IOException e) {
>                               // TODO Auto-generated catch block
>                               e.printStackTrace();
>                       }
>               }
>               return imgFile;
>       }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (COMPRESS-291) decompress .7z archive very very slow

Reply via email to