[
https://issues.apache.org/jira/browse/TIKA-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886089#comment-15886089
]
Hudson commented on TIKA-2276:
------------------------------
UNSTABLE: Integrated in Jenkins build tika-2.x #223 (See
[https://builds.apache.org/job/tika-2.x/223/])
TIKA-2276 try to reuse parsers from ParseContext rather than creating
(tallison: rev 35756b142697fe4a2e996a43dd58a6f9d5e55c05)
* (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java
* (edit)
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
* (edit)
tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentUtil.java
* (edit)
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/chm/ChmParser.java
* (edit)
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java
> Try to be more parsimonious creating TikaConfigs and ParseContexts
> ------------------------------------------------------------------
>
> Key: TIKA-2276
> URL: https://issues.apache.org/jira/browse/TIKA-2276
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Assignee: Tim Allison
> Fix For: 2.0, 1.15
>
>
> If we run the AutoDetectParser() against the files in our unit tests (around
> 600 files*), there are 701 new instantiations of TikaConfig. The time is
> around 20 seconds. If we modify AutoDetectParser to pass its TikaConfig via
> the ParseContext if one isn't already specified, that drops to 234
> instantiations, and parse time goes to ~17 seconds.
> Let's make this simple change and look for other areas to decrease the number
> of times our parsers are creating a new TikaConfig.
> *Note I did not include the testCHM2.chm monster in these runs.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)