Have you seen the Paperless <https://github.com/danielquinn/paperless>
project?
He's using Tesseract OCR as well.  There's another Paperless on github for
MacOSX too

On Tue, Jul 19, 2016 at 5:05 PM, 4kbytes <4kby...@zoho.com> wrote:

> Hello,
>
> I am currently working with the Tesseract OCR. Tesseract is owned by
> Google with Apache 2.0 licensing.
>
> The issue I am running into is text accuracy.
>
> The current process: target text color to black, background to white, max
> contrast, pass to OCR.
>
> With documents from modern word processors this approach is accurate 98%
> of the time. When trying to read commercial serials or ID's, which are can
> be very compact, the result is accurate in count but not characters.
>
> Has anyone worked with this system before and know a possible solution? I
> am currently looking into ImageMagick.
>
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

Reply via email to