[ 
https://issues.apache.org/jira/browse/COMPRESS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077436#comment-14077436
 ] 

Sebb commented on COMPRESS-285:
-------------------------------

I think it would be possible to speed up the code in the case that the archive 
is not an XZ archive.

1) Call XZCompressorInputStream.matches() first and only check for XZ if it is 
an XZ archive.
This would require making a local copy of XZ.HEADER_MAGIC; probably sensible to 
move matches() to XZUtils as well.

2) Move the check for XZ to the end of CompressorStreamFactory so the other 3 
formats are checked first.

Another possibility would be to add a second constructor which has a Boolean 
giving the result of isXZCompressionAvailable.
(null meaning unknown, so must do the check)

> checking of availability of XZ compression is expensive - result should be 
> reused
> ---------------------------------------------------------------------------------
>
>                 Key: COMPRESS-285
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-285
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.5, 1.6, 1.7, 1.8
>         Environment: linux 64-bit, java 7, glassfish, solr, tika
>            Reporter: Wojciech Ɓozowicki
>            Priority: Minor
>              Labels: performance
>
> I use solr with apache tika for indexing documents. Tika uses 
> commons-compress to handle compressed files. Using sampler (jvisualvm) I have 
> seen that quite a lot of time (5-7%) during my tests is spent in 
> XZUtils.isXZCompressionAvailable because of unavailable XZ compression (I 
> guess for each time classloaders spend some time looking for unavailable 
> classes, then NoClassDefFoundError).
> I think the result of the first check should be stored and reused.
> Here is the stacktrace (just to show the way tika is using commons-compress):
> org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailable(XZUtils.java:52)
>       at 
> org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:140)
>       at 
> org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:95)
>       at 
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:81)
>       at 
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to