Hi Nick
On 18/06/14 17:07, Nick Burch wrote:
On Wed, 18 Jun 2014, Sergey Beryozkin wrote:
The reason we need it is that CXF can not ship all of Tika Parser
dependencies because CXF will only offer a light-weight Tika-aware
handler.

Sounds like you just want to depend on tika-core then, and not
tika-parsers. That'll give you mime magic detection, and all the parser
framework, but no parsers, and none of the parser dependencies. (You
could manually pull in one or two parsers + their dependencies if you
wanted to)

Yes, depending on tika-core only made out main source code compile, adding tika-parsers with a test scope made the tests using PDFParser pass. Thanks for a hint, I did not know tika-core was enough.

So the issue of the dependency management is then relayed to the future users of our API. The use case we target is something like this: we have a CXF user with some custom application accepting documents in some limited set of formats (say PDF & Word or Excel only or some photo shop kind of application managing few types of images only). We tell this user that CXF can help with searching through this document and the user can integrate it into the application. We tell a user to add Tika parsers dependency, users asks us how to get only PDF and Excel deps added only.

I don't want to recommend them to go via the exclusion process and possibly check the source tree as you suggested in the other email :-)

Is tika-parsers effectively a collection of various parser dependencies with no some common dependencies all of other parser implementation will need, with tika-core providing a support ? If so why don't we document which well known modules support which file formats ? This wel let users don't worry about tika-parsers at all and select the dependencies they need by checking the docs ?

Sergey


Nick


Reply via email to