Hi Chris,
I was playing with it recently.
One of the big issues with tesseract is a tough process of the preparing
training set for multiple fonts and languages.
In addition, we also have to add an option for image preprocessing (skewing
+ filtering etc).


BR,
Oleg

On Wed, Nov 30, 2011 at 8:59 AM, Mattmann, Chris A (388J) <
[email protected]> wrote:

> Hey Guys,
>
> FYI: http://code.google.com/p/tesseract-ocr/
>
> I was pointed at this library by someone recently asking me if Tika
> was interested in integrating with this library. It's ALv2 licensed, and
> seems pretty interesting. I'm going to check it out, but just
> wanted to give everyone a heads up.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [email protected]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Reply via email to