Re: Plans for the first Tika 2.0 release

Nick Burch Wed, 21 Sep 2016 07:40:47 -0700

On Mon, 19 Sep 2016, Bob Paulin wrote:

I think it's a good thing to discuss. I know there are other featuresthat are targeted for 2.0. Do we have a general sense of where thosefeatures are at?

I think the big one we need to crack is allowing multiple parsers to runagainst a file. OCR is probably the most critical of these from themodularisation perspective, with all those nasty interlinkings between theparsers to allow the manual delegation. If we can crack the problem ofmultiple parsers, those proxy issues should go away (or at least getbetter!)

As a bonus, it ought to also improve things for error cases (fallbackparsers etc), but for your needs, the simplification for "ocr + imagemetadata" is likely your biggest win!

(I think it might also let us tidy up some of the enhancement parsers too,like how the NLP stuff fits into the parsing framework)


Nick

Re: Plans for the first Tika 2.0 release

Reply via email to