Hi Nick
On 18/06/14 17:07, Nick Burch wrote:
On Wed, 18 Jun 2014, Sergey Beryozkin wrote:
The reason we need it is that CXF can not ship all of Tika Parser
dependencies because CXF will only offer a light-weight Tika-aware
handler.
Sounds like you just want to depend on tika-core then, and not
tika-parsers. That'll give you mime magic detection, and all the parser
framework, but no parsers, and none of the parser dependencies. (You
could manually pull in one or two parsers + their dependencies if you
wanted to)
Yes, depending on tika-core only made out main source code compile,
adding tika-parsers with a test scope made the tests using PDFParser
pass. Thanks for a hint, I did not know tika-core was enough.
So the issue of the dependency management is then relayed to the future
users of our API.
The use case we target is something like this: we have a CXF user with
some custom application accepting documents in some limited set of
formats (say PDF & Word or Excel only or some photo shop kind of
application managing few types of images only). We tell this user that
CXF can help with searching through this document and the user can
integrate it into the application. We tell a user to add Tika parsers
dependency, users asks us how to get only PDF and Excel deps added only.
I don't want to recommend them to go via the exclusion process and
possibly check the source tree as you suggested in the other email :-)
Is tika-parsers effectively a collection of various parser dependencies
with no some common dependencies all of other parser implementation will
need, with tika-core providing a support ? If so why don't we document
which well known modules support which file formats ? This wel let users
don't worry about tika-parsers at all and select the dependencies they
need by checking the docs ?
Sergey
Nick