[
https://issues.apache.org/jira/browse/TIKA-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953639#comment-16953639
]
Tim Allison commented on TIKA-2966:
-----------------------------------
I'd want this in streaming mode to handle text as it came in by
{{characters()}}, but tokenization is critical and we can't guarantee that
parsers will call {{characters()}} on logical chunks.
> Create a tika-eval SAXHandler
> -----------------------------
>
> Key: TIKA-2966
> URL: https://issues.apache.org/jira/browse/TIKA-2966
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> One of the improvements coming in 1.23 is the decoupling of the text stats
> calculator from the tika-eval app. To make this even easier to use, let's
> add a handler that will calculate the text stats on .endDocument() and record
> those stats in a metadata object.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)