[
https://issues.apache.org/jira/browse/COMPRESS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077436#comment-14077436
]
Sebb commented on COMPRESS-285:
-------------------------------
I think it would be possible to speed up the code in the case that the archive
is not an XZ archive.
1) Call XZCompressorInputStream.matches() first and only check for XZ if it is
an XZ archive.
This would require making a local copy of XZ.HEADER_MAGIC; probably sensible to
move matches() to XZUtils as well.
2) Move the check for XZ to the end of CompressorStreamFactory so the other 3
formats are checked first.
Another possibility would be to add a second constructor which has a Boolean
giving the result of isXZCompressionAvailable.
(null meaning unknown, so must do the check)
> checking of availability of XZ compression is expensive - result should be
> reused
> ---------------------------------------------------------------------------------
>
> Key: COMPRESS-285
> URL: https://issues.apache.org/jira/browse/COMPRESS-285
> Project: Commons Compress
> Issue Type: Improvement
> Components: Compressors
> Affects Versions: 1.5, 1.6, 1.7, 1.8
> Environment: linux 64-bit, java 7, glassfish, solr, tika
> Reporter: Wojciech Ćozowicki
> Priority: Minor
> Labels: performance
>
> I use solr with apache tika for indexing documents. Tika uses
> commons-compress to handle compressed files. Using sampler (jvisualvm) I have
> seen that quite a lot of time (5-7%) during my tests is spent in
> XZUtils.isXZCompressionAvailable because of unavailable XZ compression (I
> guess for each time classloaders spend some time looking for unavailable
> classes, then NoClassDefFoundError).
> I think the result of the first check should be stored and reused.
> Here is the stacktrace (just to show the way tika is using commons-compress):
> org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailable(XZUtils.java:52)
> at
> org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:140)
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:95)
> at
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:81)
> at
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
--
This message was sent by Atlassian JIRA
(v6.2#6252)