[
https://issues.apache.org/jira/browse/TIKA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834827#comment-16834827
]
Tim Allison commented on TIKA-2864:
-----------------------------------
{{CompositeParser#getParsers(Context context)}} is far less efficient than
{{CompositeParser#getAllComponentParsers}}. I swapped the former for the
latter, and the performance times went back to what they were for RFC822. I
also cleaned up some non-cached {{getSupportedTypes()}} in some of the parsers.
The diff btwn 1.20 and 1.21 is that the RFC822 parser was looking for either
the TXTParser (which wasn't loaded by default) or the TextAndCSVParser.
Apparently, it took quite a while to look for a non-existent parser with
{{CompositeParser#getParsers()}}.
> Fix regression in RFC822 parsing time
> -------------------------------------
>
> Key: TIKA-2864
> URL: https://issues.apache.org/jira/browse/TIKA-2864
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Blocker
>
> In running the regression tests, we found a 1000x slowdown in rfc files on
> the full batch run. When we try to reproduce this locally, we can only
> replicate 10x, even multithreaded, but we can at least replicate a 10x
> slowdown.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)