Hi,
On 15/12/14 14:28, Nick Burch wrote:
On Mon, 15 Dec 2014, Sergey Beryozkin wrote:
OSGi users would pick tika + tika-parsers, or tika + tika-parsers-pdf,
or tika + tika-parsers-pdf + tika-parsers-mp3 if they want

OSGi is nicely contained, and fairly easy to unit test, so let's use
that to test out the idea! That also solves the CXF need. Once that
works, and once we have a tested way that everyone can see + understand,
then someone can try to make the case for phase II where we push it to
the maven pom / project level!

The need of CXF (Tika) users (or of some other users with possibly
similar requirements) is not about shipping OSGI only Tika modules but
about having an easy option of not to having include all the
tika-parsers. Some CXF users would work with OSGI, some not. Sorry if
I did not clarify it.

I see us using OSGi as a way to test it, unit test it, and have unit
tested documentation for moderately advanced maven users. If we just put
up a page with "this is what we think you might need to exclude and
incldue", it'll almost always be wrong... Saying "OSGi users use this,
others take the info from a green build of the OSGi module" means we can
have tested docs!

OK.

As I said, a module marked as "bundle", as opposed to a default 'jar'
is just a plain jar with few extra META-INF instructions.

Given it, I'm not understanding why you are opposed to not having
tika-parsers minimized as I suggested ? What exactly is your concern ?

We have users who get confused by no parsers working when they depend on
tika-core only. Not so many on the list these days, but loads if you
look out into the wider internet at other support forums. Those kinds of
users will only find things worse if the tika parsers get split out.

We also have the massive faff that is maintaining tika parsers outside
of the tika-parsers module. It seemed a great theory, and we tried it.
The PDF box one just didn't get picked up or maintained, never really
left, and the move was abandoned + main parser reverted to being in
Tika. I did all the Vorbis parser stuff outside as well, as championed
by the plan, and it has worked out a lot more work for me than if it'd
been in Tika itself. So, existing scars are another reason!

(That's why I suggested this as a compromise plan - change nothing for
normal Java users, until we see if it'll work + be of interest or not.
If it does work for all, case for the main change already made! If it
doesn't work, there's nothing to un-do)

I'm not proposing to split tika-parsers in a way that would affect the users, tika-parsers would still be there, except that it would strongly depend on tika-pdf and perhaps, when it is being built, it can have its dependencies like tika-pdf shaded in/merged in to ensure a complete backward-compatibility as far as the user expectations of tika-parsers is concerned. I think it is your main concern, that users of tika-parsers can be affected. Would what I just said above work for all ? I'm hoping yes but may be I'm still missing something :-)

Sergey



Nick


Reply via email to