[ https://issues.apache.org/jira/browse/TIKA-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964044#comment-16964044 ]
Tim Allison commented on TIKA-2972: ----------------------------------- Thank you, [~nick]. Y, agreed... bq. I guess we'd provide a method on `TikaConfig` to get all the factories as a minimum? Possibly also one that takes a name that returns a factory, not sure if that should have an implicit default or take an explicit default or return null or throw exception on an invalid name? Y, that's what I was thinking. > Allow users to specify a list/map of ContentHandlerFactories in > tika-config.xml > ------------------------------------------------------------------------------- > > Key: TIKA-2972 > URL: https://issues.apache.org/jira/browse/TIKA-2972 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Major > > I'd like to add a tika-eval handler that will calculate text stats at the end > of parsing a document so that the user can get a unified/simpler view of > number of tokens/ out of vocabulary, etc. in the metadata rather than having > to run their own post-parse process on the content. > The problem comes with integrating this into tika-app and tika-server -- > tika-app balloons to 134MB. I don't want to nearly double the size of > tika-app just so that I can add some stuff that very few folks will use. > I think we've discussed this option before, but it would be handy to allow > users to specify a ContentHandlerFactory or possibly a map of > ContentHandlerFactories in tika-config.xml so that users can get custom > handling in tika-app and tika-server. > The idea of a map of ContentHandlerFactories, would be to have a name for > each content handler factory, and a user could call different handlers on > tika-server like this: > -{{curl... http://localhost:9998/tika/custom/myhandler1}}- > -{{curl... http://localhost:9998/tika/custom/myhandler2}}- > That's not right because we'd want to differentiate classic Tika parsing and > the RecursiveParserWrapper... > {{curl... http://localhost:9998/tika/myhandler1}} > {{curl... http://localhost:9998/tika/myhandler2}} > {{curl... http://localhost:9998/rmeta/myhandler1}} > {{curl... http://localhost:9998/rmeta/myhandler2}} > or in tika-app: > {{java -jar tika-app.jar --handlerFactory=myhandler1...}} > WDYT? -- This message was sent by Atlassian Jira (v8.3.4#803005)