[ 
https://issues.apache.org/jira/browse/COMPRESS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092467#comment-14092467
 ] 

Stefan Bodewig commented on COMPRESS-285:
-----------------------------------------

Thanks Sebb, I think your two suggestions are good ideas and will see to 
implementing them the coming week, in particular you will only pay for the 
failed XZ check if you are really trying to uncompress XZ streams.

The additional constructor won't help Wojciech since he's using Compress behind 
Tika, Tika would need to get adapted to the new constuctor and in the end 
implement its own logic which would also need to take OSGi contexts into 
account.  I think it might be a good idea to add an explicit flag whether the 
result is cacheable and make that flag default to true unless BundleEvent can 
be loaded - Wojciech would then need to set the flag explicitly.

> checking of availability of XZ compression is expensive - result should be 
> reused
> ---------------------------------------------------------------------------------
>
>                 Key: COMPRESS-285
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-285
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.5, 1.6, 1.7, 1.8
>         Environment: linux 64-bit, java 7, glassfish, solr, tika
>            Reporter: Wojciech Ɓozowicki
>            Priority: Minor
>              Labels: performance
>
> I use solr with apache tika for indexing documents. Tika uses 
> commons-compress to handle compressed files. Using sampler (jvisualvm) I have 
> seen that quite a lot of time (5-7%) during my tests is spent in 
> XZUtils.isXZCompressionAvailable because of unavailable XZ compression (I 
> guess for each time classloaders spend some time looking for unavailable 
> classes, then NoClassDefFoundError).
> I think the result of the first check should be stored and reused.
> Here is the stacktrace (just to show the way tika is using commons-compress):
> org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailable(XZUtils.java:52)
>       at 
> org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:140)
>       at 
> org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:95)
>       at 
> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:81)
>       at 
> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to