Re: Plans for the first Tika 2.0 release

Mattmann, Chris A (3980) Wed, 21 Sep 2016 11:25:08 -0700

NLP/NER is as high a priority to me as the OCR stuff..we have a whole meta 
framework
for doing NER/NLP with NERRecogniser and really cool Tensorflow and other stuff.
Hoping 2.0 can help solve this! ☺


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect, Instrument Software and Science Data Systems Section (398)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 


On 9/21/16, 7:40 AM, "Nick Burch" <[email protected]> wrote:

    On Mon, 19 Sep 2016, Bob Paulin wrote:
    > I think it's a good thing to discuss.  I know there are other features 
    > that are targeted for 2.0.  Do we have a general sense of where those 
    > features are at?
    
    I think the big one we need to crack is allowing multiple parsers to run 
    against a file. OCR is probably the most critical of these from the 
    modularisation perspective, with all those nasty interlinkings between the 
    parsers to allow the manual delegation. If we can crack the problem of 
    multiple parsers, those proxy issues should go away (or at least get 
    better!)
    
    As a bonus, it ought to also improve things for error cases (fallback 
    parsers etc), but for your needs, the simplification for "ocr + image 
    metadata" is likely your biggest win!
    
    (I think it might also let us tidy up some of the enhancement parsers too, 
    like how the NLP stuff fits into the parsing framework)
    
    Nick

Re: Plans for the first Tika 2.0 release

Reply via email to