Have you seen the Paperless <https://github.com/danielquinn/paperless> project? He's using Tesseract OCR as well. There's another Paperless on github for MacOSX too
On Tue, Jul 19, 2016 at 5:05 PM, 4kbytes <4kby...@zoho.com> wrote: > Hello, > > I am currently working with the Tesseract OCR. Tesseract is owned by > Google with Apache 2.0 licensing. > > The issue I am running into is text accuracy. > > The current process: target text color to black, background to white, max > contrast, pass to OCR. > > With documents from modern word processors this approach is accurate 98% > of the time. When trying to read commercial serials or ID's, which are can > be very compact, the result is accurate in count but not characters. > > Has anyone worked with this system before and know a possible solution? I > am currently looking into ImageMagick. > > _______________________________________________ > gnhlug-discuss mailing list > gnhlug-discuss@mail.gnhlug.org > http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/ >
_______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/