Hi Chris, I was playing with it recently. One of the big issues with tesseract is a tough process of the preparing training set for multiple fonts and languages. In addition, we also have to add an option for image preprocessing (skewing + filtering etc).
BR, Oleg On Wed, Nov 30, 2011 at 8:59 AM, Mattmann, Chris A (388J) < [email protected]> wrote: > Hey Guys, > > FYI: http://code.google.com/p/tesseract-ocr/ > > I was pointed at this library by someone recently asking me if Tika > was interested in integrating with this library. It's ALv2 licensed, and > seems pretty interesting. I'm going to check it out, but just > wanted to give everyone a heads up. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >
