[
https://issues.apache.org/jira/browse/TIKA-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851822#comment-16851822
]
Jonathan Essex commented on TIKA-2882:
--------------------------------------
I built the 2.x branch and deployed tika-core and tika-parser-modules to my own
maven repo. Then I added the ooxml parser from tika-parser-modules as a
dependency for my app and built a couple of test cases.
Unsurprisingly, I had *FAR* fewer dependency issues. In fact the whole thing
went remarkably smoothly, the only problem was a failing test BundleIT test in
core related to OSGI (I just commented it out...) and the fact that
tika-parser-office-module includes slf4j-log4j12 (I just removed it...).
Given how much more useful this makes tika as a library, I think it would be a
great shame to allow tika-parser-modules to languish in a development branch
any longer than it has to. What can I do to help (...given my limited
experience of the Tika codebase)?
> Parsers should not include HTTP client code
> -------------------------------------------
>
> Key: TIKA-2882
> URL: https://issues.apache.org/jira/browse/TIKA-2882
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.21
> Reporter: Jonathan Essex
> Priority: Major
>
> Folks, does it really make sense for a parser to have a REST client built in?
> The GROBID and NLTKNERecogniser parsers use the apache CXF client directly.
>
> Since I don't use CXF and my entire app is built on a different JAX-RS stack
> this just dropped me straight into dependency hell.
> Surely it would make more sense to keep the parsers... well, parsers... and
> build support for delegating parsing to other services into some higher level
> in the stack (such as the server, where the CXF dependency is more benign).
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)