[ 
https://issues.apache.org/jira/browse/TIKA-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883314#comment-16883314
 ] 

Hudson commented on TIKA-1568:
------------------------------

SUCCESS: Integrated in Jenkins build tika-branch-1x #207 (See 
[https://builds.apache.org/job/tika-branch-1x/207/])
TIKA-1568 -- statically cache encoding detector in AutoDetectReader when 
(tallison: 
[https://github.com/apache/tika/commit/830094e36c478fa83df3dae7a83a235ee2177782])
* (add) 
tika-parsers/src/test/java/org/apache/tika/parser/AutoDetectReaderParserTest.java
* (edit) tika-core/src/main/java/org/apache/tika/detect/AutoDetectReader.java


> AutoDetectReader performance problem
> ------------------------------------
>
>                 Key: TIKA-1568
>                 URL: https://issues.apache.org/jira/browse/TIKA-1568
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.7
>            Reporter: Andrzej Bialecki 
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.22
>
>
> Parsing performance of many text files suffers from repeated calls to 
> ServiceLoader.loadServiceProviders(EncodingDetector.class). This happens in 
> TXTParser, HTMLParser and SourceCodeParser. In most cases, when Tika is using 
> the default ServiceLoader instance created in the Parser's static section 
> this cost can be avoided by caching the resulting List<EncodingDetector> 
> either at a higher level in the Parser (as a static property). If using 
> custom ServiceLoader-s this can be achieved by putting this list in 
> ParsingContext, or caching these lists at a lower level in the ServiceLoader 
> component.
> Relevant part of  the stacktrace follows:
> {code}
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at java.util.zip.ZipFile.getEntry(ZipFile.java:304)
>       - locked <0x00000007909d2e48> (a java.util.jar.JarFile)
>       at java.util.jar.JarFile.getEntry(JarFile.java:227)
>       at java.util.jar.JarFile.getJarEntry(JarFile.java:210)
>       at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:840)
>       at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:818)
>       at sun.misc.URLClassPath$1.next(URLClassPath.java:226)
>       at sun.misc.URLClassPath$1.hasMoreElements(URLClassPath.java:236)
>       at java.net.URLClassLoader$3$1.run(URLClassLoader.java:583)
>       at java.net.URLClassLoader$3$1.run(URLClassLoader.java:581)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at java.net.URLClassLoader$3.next(URLClassLoader.java:580)
>       at java.net.URLClassLoader$3.hasMoreElements(URLClassLoader.java:605)
>       at java.util.Collections.list(Collections.java:3687)
>       at 
> org.eclipse.jetty.webapp.WebAppClassLoader.toList(WebAppClassLoader.java:337)
>       at 
> org.eclipse.jetty.webapp.WebAppClassLoader.getResources(WebAppClassLoader.java:321)
>       at 
> org.apache.tika.config.ServiceLoader.findServiceResources(ServiceLoader.java:210)
>       at 
> org.apache.tika.config.ServiceLoader.identifyStaticServiceProviders(ServiceLoader.java:277)
>       at 
> org.apache.tika.config.ServiceLoader.loadStaticServiceProviders(ServiceLoader.java:306)
>       at 
> org.apache.tika.config.ServiceLoader.loadServiceProviders(ServiceLoader.java:228)
>       at 
> org.apache.tika.detect.AutoDetectReader.<init>(AutoDetectReader.java:104)
>       at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:70)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
>       at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to