On Mon, 19 Sep 2016, Bob Paulin wrote:
I think it's a good thing to discuss. I know there are other features
that are targeted for 2.0. Do we have a general sense of where those
features are at?
I think the big one we need to crack is allowing multiple parsers to run
against a file. OCR is probably the most critical of these from the
modularisation perspective, with all those nasty interlinkings between the
parsers to allow the manual delegation. If we can crack the problem of
multiple parsers, those proxy issues should go away (or at least get
As a bonus, it ought to also improve things for error cases (fallback
parsers etc), but for your needs, the simplification for "ocr + image
metadata" is likely your biggest win!
(I think it might also let us tidy up some of the enhancement parsers too,
like how the NLP stuff fits into the parsing framework)