Wojciech Łozowicki created COMPRESS-285:
-------------------------------------------

             Summary: checking of availability of XZ compression is expensive - 
result should be reused
                 Key: COMPRESS-285
                 URL: https://issues.apache.org/jira/browse/COMPRESS-285
             Project: Commons Compress
          Issue Type: Improvement
          Components: Compressors
    Affects Versions: 1.8, 1.7, 1.6, 1.5
         Environment: linux 64-bit, java 7, glassfish, solr, tika
            Reporter: Wojciech Łozowicki
            Priority: Minor


I use solr with apache tika for indexing documents. Tika uses commons-compress 
to handle compressed files. Using sampler (jvisualvm) I have seen that quite a 
lot of time (5-7%) during my tests is spent in XZUtils.isXZCompressionAvailable 
because of unavailable XZ compression (I guess for each time classloaders spend 
some time looking for unavailable classes, then NoClassDefFoundError).

I think the result of the first check should be stored and reused.

Here is the stacktrace (just to show the way tika is using commons-compress):
org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailable(XZUtils.java:52)
        at 
org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:140)
        at 
org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:95)
        at 
org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:81)
        at 
org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to