[
https://issues.apache.org/jira/browse/TIKA-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2972:
------------------------------
Summary: Allow users to specify a list/map of ContentHandlerFactories in
tika-config.xml (was: Allow users to specify a ContentHandlerFactory in
tika-config.xml)
> Allow users to specify a list/map of ContentHandlerFactories in
> tika-config.xml
> -------------------------------------------------------------------------------
>
> Key: TIKA-2972
> URL: https://issues.apache.org/jira/browse/TIKA-2972
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Major
>
> I'd like to add a tika-eval handler that will calculate text stats at the end
> of parsing a document so that the user can get a unified/simpler view of
> number of tokens/ out of vocabulary, etc. in the metadata rather than having
> to run their own post-parse process on the content.
> The problem comes with integrating this into tika-app and tika-server --
> tika-app balloons to 134MB. I don't want to nearly double the size of
> tika-app just so that I can add some stuff that very few folks will use.
> I think we've discussed this option before, but it would be handy to allow
> users to specify a ContentHandlerFactory or possibly a map of
> ContentHandlerFactories in tika-config.xml so that users can get custom
> handling in tika-app and tika-server.
> The idea of a map of ContentHandlerFactories, would be to have a name for
> each content handler factory, and a user could call different handlers on
> tika-server like this:
> `curl... http://localhost:9998/tika/custom/myhandler1`
> `curl... http://localhost:9998/tika/custom/myhandler2`
> or in tika-app:
> `java -jar tika-app.jar --handlerFactory=myhandler1...`
> WDYT?
> WDYT?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)