Re: [gentoo-user] multi-region OCR

Landis Blackwell Wed, 30 Nov 2016 11:49:23 -0800

Did you train tesseract per chance? And could I get some sample images?


Landis


On 11/30/2016 12:28 PM, Michael Mol wrote:

On Wednesday, November 30, 2016 05:34:25 PM J. Roeleveld wrote:

On November 30, 2016 6:03:36 PM GMT+01:00, Michael Mol <[email protected]>

wrote:

On Wednesday, November 30, 2016 10:43:13 AM J. Roeleveld wrote:

On Tuesday, November 29, 2016 11:18:36 PM [email protected] wrote:

Michael Mol:
...

xsane would have let me do it during the scan process if I'd

thought of

it
then, but the scans are done, drives aren't there any more.

Something

...

If xsane solves your need why don't you just print your scans so

xsane

can do its job ?

There has to be a way to do this without killing an entire forest...

And big chunks of ink cartridges. The scans stretched the contrast so I
can
clearly read the drive labels through the translucent anti-static bags,
which
means a huge chunk of the image (what's outside the labels) is pure
black.

Which I could get around by spending fifteen minutes munging things in
the Gimp
before printing, but at that point, I may as well just transcribe
things
manually at that point.

Looking for something reasonably simple to improve the general
workflow. I'd
have hoped something would have already been available on Linux; it'd
be easy
enough to copy the scans to my phone and feed them through Google
Goggles for
the desired output, but then I'm deliberately filtering company data
through an
outside entity.

Did you manage to use that link I sent?

I did. tesseract almost worked, even separating the regions cleanly in its
output, but it seems, sadly, that the 300dpi scans were insufficient to get a
good read; lots of clear corruption of the text, so things like serial
numbers, model numbers, version numbers--everything you'd care about--would be
highly suspect.

The next tool that looked like it might work, gscan2pdf, wasn't in portage,
and with the semi-garbled output from tesseract suggesting the scans were too
poor quality, I didn't pursue further.

Re: [gentoo-user] multi-region OCR

Reply via email to