NLP/NER is as high a priority to me as the OCR stuff..we have a whole meta
for doing NER/NLP with NERRecogniser and really cool Tensorflow and other stuff.
Hoping 2.0 can help solve this! ☺
Chris Mattmann, Ph.D.
Chief Architect, Instrument Software and Science Data Systems Section (398)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
On 9/21/16, 7:40 AM, "Nick Burch" <apa...@gagravarr.org> wrote:
On Mon, 19 Sep 2016, Bob Paulin wrote:
> I think it's a good thing to discuss. I know there are other features
> that are targeted for 2.0. Do we have a general sense of where those
> features are at?
I think the big one we need to crack is allowing multiple parsers to run
against a file. OCR is probably the most critical of these from the
modularisation perspective, with all those nasty interlinkings between the
parsers to allow the manual delegation. If we can crack the problem of
multiple parsers, those proxy issues should go away (or at least get
As a bonus, it ought to also improve things for error cases (fallback
parsers etc), but for your needs, the simplification for "ocr + image
metadata" is likely your biggest win!
(I think it might also let us tidy up some of the enhancement parsers too,
like how the NLP stuff fits into the parsing framework)