Re: [sane-devel] Scan quality enhancements/processing (vs Windows with Fujitsu ScanSnap S1500)

2017-07-07 Thread m. allan noah
Matt-

I wrote both the sane-fujitsu backend, and the sanei_magic library
that it uses to provide the deskew/crop, etc functions. I'll try to
answer your questions as best I can. BTW, sorry I did not reply
sooner- I have been traveling, and wanted to write something a bit
longer than I could stand on my cell phone :)

Further comments inline:

On Mon, Jul 3, 2017 at 4:02 PM, Matt Garman  wrote:
> TL;DR: for those of you who have migrated your document scanning
> workflow from Windows to a Sane platform (e.g. Linux), what
> settings/tools have you found to maintain or improve quality of
> scanned documents (relative to Windows)?
>
> Long version:
>
> I have a Fujitsu ScanScan s1500 document scanner, and sane-1.0.27
> running on Arch Linux.  So far it seems to just work.  I've had this
> scanner for nearly a decade, and used it exclusively under Windows
> until now (trying to move to a pure Linux desktop).
>
> So while I can scan documents just fine, the results to me aren't as
> good as what I get under Windows using the proprietary ScanSnap
> software.  Specifically, they are too light/too dark, text not crisp
> enough, straight lines not straight, colors a bit off, etc.
>
> One example: de-skewing.  All the years I've had this scanner, I
> didn't even realize this was a thing until now.  I can use scanimage's
> software de-skewing (--swdeskew=yes), and it seems to *mostly* work,
> but pages are often still somewhat skewed.  Excepting for
> wacky/unusual documents, I don't recall ever seeing any skew under
> Windows.

That is not scanimage providing the deskew, but the fujitsu backend
(driver) itself, using the sanei_magic library, and your host system's
CPU. The deskewing algorithm uses a simplified Hough transform, which
attempts to detect the edges of the paper instead of the print on it.
This code works better if you expand the dimensions of the scan to
grab some of the background. If you attempt to crop the image too
small using the page_width and page_height or x/y params, it will
likely fail to deskew, or pick the wrong feature to align, which will
make things worse.

In general, if you are using swdeskew, it is probably better to scan
at full width, and use the swcrop option too. Also, some fujitsu
machines support the overscan option, which will cause the scanner to
output some extra background rows before the paper is ingested. This
can significantly improve the swdeskew performance. The S1500 does not
have a black background option, but the larger scanners do, and this
will also help.

If you have a document that consistently reproduces poor deskewing,
even with those additional options, I'd like to see a .pnm file of the
scan with and without swdeskew enabled.

>
> Despeckling (--swdespeck=n) does seem to be a major step in the right 
> direction.

Yes- particularly if you are scanning in binary (line art) mode, small
amounts of noise can be distracting.

>
> I'm also playing with all the enhancement options.  E.g.,
> --brightness, --contrast, --emphasis, etc.  Brightness and contrast
> are fairly intuitive, but I don't really understand what the other
> options actually mean, or what I should expect from them.  I've been
> taking the trial-and-error approach, but e.g. --variance doesn't seem
> to do anything.  And I'm not sure how the options interact with each
> other, so trial-and-error could take forever.

All the options you list here are values which we send to the
hardware. Frankly, I have little documentation about what they do, but
it is certainly possible that some of these only have effect in binary
mode, and they may not even work on the S1500. I'll see if I can track
that down, and disable them in cases where they cannot be used.

> Having said all that, my one test document is maybe 90% as "good" as
> the same scanned on Windows.  Probably good enough to live with, but:
>
> (1) This seems to be a very popular scanner - has anyone been able to
> back out the settings the proprietary ScanSnap software uses?
> - and -
> (2) I wonder if the Windows ScanSnap software settings are static or
> dynamic?  E.g., is there some kind of pre-processing algorithm applied
> to guess at the best enhancement settings?

I'm willing to guess that much of what you are seeing is the windows
software making a larger, potentially higher resolution or color mode
scan, and then cleaning it up in software.

I think we will need side-by-side example images from windows and
sane, along with the list of options for each to diagnose further.

allan
-- 
"well, I stand up next to a mountain- and I chop it down with the edge
of my hand"

-- 
sane-devel mailing list: sane-devel@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel
Unsubscribe: Send mail with subject "unsubscribe your_password"
 to sane-devel-requ...@lists.alioth.debian.org


Re: [sane-devel] OCR

2017-07-07 Thread Yuri Chornoivan
четвер, 29-чер-2017 18:02:17 Peter Dick написано:
> Hallo,  -  I want to use OCR from scanned  PDF.
> When choosing a file name  I get an error !
> What should be the right steps to OCR-working with "XSane"?
> 
> Thank for an answer
> 
> Peter Dick

Hi,

I think that XSane is not actually the right tool to do OCR for PDFs. It is 
much more convenient to do this using gscan2pdf (Ctrl+I (importPDF), then 
"Tools -> OCR", then Ctrl+S).

Other options can be OCRFeeder and pdfocr script [1].

Hope this helps.

Best regards,
Yuri

[1] https://raw.githubusercontent.com/gkovacs/pdfocr/master/pdfocr.rb

-- 
sane-devel mailing list: sane-devel@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel
Unsubscribe: Send mail with subject "unsubscribe your_password"
 to sane-devel-requ...@lists.alioth.debian.org

[sane-devel] OCR

2017-07-07 Thread Peter Dick

Hallo,  -  I want to use OCR from scanned  PDF.
When choosing a file name  I get an error !
What should be the right steps to OCR-working with "XSane"?

Thank for an answer

Peter Dick

--
Dipl.Ing.Peter Dick

Am Weinberg 21
88697 Bermatingen
07544-2216


--
sane-devel mailing list: sane-devel@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel
Unsubscribe: Send mail with subject "unsubscribe your_password"
to sane-devel-requ...@lists.alioth.debian.org