Re: Can some of tika-parsers module dependencies be made optional ?

Sergey Beryozkin Thu, 19 Jun 2014 02:33:13 -0700

Hi Nick
On 18/06/14 17:07, Nick Burch wrote:

On Wed, 18 Jun 2014, Sergey Beryozkin wrote:

The reason we need it is that CXF can not ship all of Tika Parser
dependencies because CXF will only offer a light-weight Tika-aware
handler.


Sounds like you just want to depend on tika-core then, and not
tika-parsers. That'll give you mime magic detection, and all the parser
framework, but no parsers, and none of the parser dependencies. (You
could manually pull in one or two parsers + their dependencies if you
wanted to)

Yes, depending on tika-core only made out main source code compile,adding tika-parsers with a test scope made the tests using PDFParserpass. Thanks for a hint, I did not know tika-core was enough.

So the issue of the dependency management is then relayed to the futureusers of our API.The use case we target is something like this: we have a CXF user withsome custom application accepting documents in some limited set offormats (say PDF & Word or Excel only or some photo shop kind ofapplication managing few types of images only). We tell this user thatCXF can help with searching through this document and the user canintegrate it into the application. We tell a user to add Tika parsersdependency, users asks us how to get only PDF and Excel deps added only.

I don't want to recommend them to go via the exclusion process andpossibly check the source tree as you suggested in the other email :-)

Is tika-parsers effectively a collection of various parser dependencieswith no some common dependencies all of other parser implementation willneed, with tika-core providing a support ? If so why don't we documentwhich well known modules support which file formats ? This wel let usersdon't worry about tika-parsers at all and select the dependencies theyneed by checking the docs ?


Sergey


Nick

Re: Can some of tika-parsers module dependencies be made optional ?

Reply via email to