I wanted to personally thank Grant for pushing this and getting the initial code and idea started. Thank you Grant you da man.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: "Grant Ingersoll (JIRA)" <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Friday, September 19, 2014 7:34 AM To: "[email protected]" <[email protected]> Subject: [jira] [Commented] (TIKA-93) OCR support > > [ >https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plug >in.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140649#commen >t-14140649 ] > >Grant Ingersoll commented on TIKA-93: >------------------------------------- > >Very cool! Thanks for following through on this! > >> OCR support >> ----------- >> >> Key: TIKA-93 >> URL: https://issues.apache.org/jira/browse/TIKA-93 >> Project: Tika >> Issue Type: New Feature >> Components: parser >> Reporter: Jukka Zitting >> Assignee: Chris A. Mattmann >> Priority: Minor >> Fix For: 1.7 >> >> Attachments: Petr_tika-config.xml, TIKA-93.patch, >>TIKA-93.patch, TIKA-93.patch, TIKA-93.patch, TesseractOCRParser.patch, >>TesseractOCRParser.patch, TesseractOCR_Tyler.patch, >>TesseractOCR_Tyler_v2.patch, TesseractOCR_Tyler_v3.patch, >>TesseractOCR_Tyler_v4.patch, testOCR.docx, testOCR.pdf, testOCR.pptx >> >> >> I don't know of any decent open source pure Java OCR libraries, but >>there are command line OCR tools like Tesseract >>(http://code.google.com/p/tesseract-ocr/) that could be invoked by Tika >>to extract text content (where available) from image files. > > > >-- >This message was sent by Atlassian JIRA >(v6.3.4#6332)
