[
https://issues.apache.org/jira/browse/COMPRESS-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108171#comment-15108171
]
Dawid Weiss commented on COMPRESS-291:
--------------------------------------
I don't think it's lzma in my case -- the 7z archive I checked used bzip
streams and this was the performance bottleneck. I looked at the code -- and as
Stefan mentioned, it is a close port of the C code, but some things that work
in C don't yield as well to Java (case switch for state transitions on every
single byte read from stream, double int[][] indirections, etc.). Fun work to
try to optimize this (and understand the underlying code)!
bq. In C you simply cast the char* pointing to the raw data to an int* and
compare four bytes at once (sacrificing matches that are not aligned at four
byte boundaries). Using sun.misc.Unsafe is a frowned upon option that we've not
chosen so far. LZMA is most probably even worse since it works at the bit level
rather than the byte level.
See this thread, Stefan -- http://markmail.org/thread/zqscb5jktgodxj5p -- there
are ways to do it efficiently (see Aleksey Shipilev's benchmark code, at the
end of the thread), although it requires Java7.
> decompress .7z archive very very slow
> -------------------------------------
>
> Key: COMPRESS-291
> URL: https://issues.apache.org/jira/browse/COMPRESS-291
> Project: Commons Compress
> Issue Type: Improvement
> Components: Compressors
> Affects Versions: 1.9
> Environment: Windows 7 x64, jdk1.7.0_21 x64
> Reporter: Robert Jansen
> Priority: Minor
>
> I have 7z archives with one large image and many small files. The following
> code decompresses to a directory and returns the largest file. It is
> glacially slow and not usable for GB size files:
> public File unSevenZipToDir(File sevenZipFile, File outputDir) {
>
> File imgFile = null;
> // Make sure output dir exists
> outputDir.mkdirs();
> if (outputDir.exists()) {
>
> //FileInputStream stream;
> try {
>
> FileOutputStream output = null;
> SevenZFile f7z = new SevenZFile(sevenZipFile);
> SevenZArchiveEntry entry;
> long maxSize = 0;
> while ((entry = f7z.getNextEntry()) != null) {
> if (entry != null) {
> String s = entry.getName();
> if (s != null) {
> long sz =
> entry.getSize();
>
> if (sz > 0) {
> int count;
> byte data[] =
> new byte[4096];
>
> String
> outFileName = outputDir.getPath() + "/"
>
> + new File(entry.getName()).getName();
>
>
>
>
>
>
>
>
>
>
>
>
>
> File outFile =
> new File(outFileName);
>
> // Extract only
> if it does not already exist
> if
> (outFile.exists() == false) {
>
> System.out.println("Extracting " + s + " => size = " + sz);
>
>
>
>
> FileOutputStream fos = new FileOutputStream(
>
> outFile);
>
>
>
> BufferedOutputStream dest = new BufferedOutputStream(
>
> fos);
>
> while
> ((count = f7z.read(data)) != -1) {
>
> dest.write(data, 0, count);
> }
>
>
> dest.flush();
>
> dest.close();
>
> } else {
>
> System.out.println("Using already Extracted " + s + " => size = " + sz);
> }
> if
> (s.endsWith(".h5") || s.endsWith(".tif") ||
>
> s.endsWith(".cos") || s.endsWith(".nitf")
>
> || s.endsWith(".ntf")
>
> || s.endsWith(".jpg") && sz > maxSize) {
> maxSize
> = sz;
> imgFile
> = new File(outFileName);
> }
> } // end sz > 0
> } // end s != null
> } // end if entry
> } // end while
> f7z.close();
> } catch (FileNotFoundException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> } catch (IOException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> }
> return imgFile;
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)