I’d have to respectfully disagree with most of those points but if there’s that much resistance to the idea I’ll drop it.
Cheers, Ray On June 19, 2014 at 3:22:14 PM, Nick Burch ([email protected]) wrote: > On Thu, 19 Jun 2014, Ray Gauss wrote: > > The point of a tika-parsers-all artifact would be a single dependency > > that re-aggregates everything so that downstream projects could work the > > same way they do now and not worry about missing dependencies. > > > > What’s the disadvantage for splitting things up (in a 2.0 timeframe)? > > We already have users confused by the current split between tika-core and > tika-parsers - see users list for example. We already have users confused > by what dependencies they need with the current poms setup. Splitting is > going to make that a lot worse. (POI, as a related example, sees plenty of > confused users who've got mis-matched jars and problems. Splitting is > going to make that a lot worse.) > > We have previously tried pushing parsers out of the tika parser jar and > into other jars, eg ones maintained by external groups, but on the whole > it hasn't been a great success. Keeping them in sync, dealing with > different cycles, applying updates, keeping them consistent, building in a > sensible length of time, all of that would be harder with a pile of > modules. > > If we were to split out out to the level needed by some of the use cases > mentioned, we'd have so many parser modules it'd be a nightmare to > maintain, and would case problems mentioned above. (People in other > threads have cautioned on these problems). If we split into just a handful > of sub modules, then many of the uses cases mentioned still have to do > work to pick out the bits they need > > I still believe that the main use case of tika is "everything included", > and especially that's the beginners use case, so I think we should focus > on keeping that easy. Peeling out just some bits feels like an advanced > use case to me, so I'd rather we put the requirement for effort onto those > folks, rather than onto newbies and people on the typical uses. I'd > therefore much rather we provide advanced docs/help on excluding some > bits, rather than pull it out into a pile of different modules. > > Nick
