At 10:29 pm +1000 19/8/06, [EMAIL PROTECTED] wrote:
I would also like to know what people are doing in this quarter. We are
looking at some research projects where we might be OCRing 100,000 pages
(already scanned), so any simple efficienies will be worhtwhile.
thanks
jon patrick
Quoting Ian Cheong <[EMAIL PROTECTED]>:
I would like to know if anybody is scanning to OCR'd pdfs, or knows
anyone in GP land who is doing so.
I'd prefer to do this to scanning only images or only OCR, because
the content is then text searchable with background image retained
for "medicolegal" puposes.
There seem to be several commercial products ranging from 100s to
1000s of dollars that will do this - interested to know what products
work well.
>
You can download several of the commercial programs in demo (time
limited) version for free - so DIY testing it is for now. Having
trouble finding any good comparative reviews on the net.
My limited testing suggests you could be stuck with "horses for
courses" - depending on quality of scanned images and need for
recognition accuracy.
A friend who works for an accounting firm says OmniPage is best - but
then they might have a bent for accurate numbers?
I scanned the same invoice with logotype, text fonts, tiny fonts and
ran them through a couple of OCR programs that do layered PDF
image+text. File size differed by a factor of more than 10.
Recognition accuracy differed depending on which quality original
scan was used. There was no clear consistent winner.
Anyone interested in the GP scanning challenge??
We could get some representative examples of anomyised scanned
documents of varying quality to test OCR engines for:
recognition accuracy
speed
file size
They and the test results could have a permanent home on ozdocit.
Ian.
--
Dr Ian R Cheong, BMedSc, FRACGP, GradDipCompSc, MBA(Exec)
Health Informatics Consultant, Brisbane, Australia
Internet: [EMAIL PROTECTED]
(for urgent matters, please send a copy to my practice email as well:
[EMAIL PROTECTED])
PRIVACY NOTE
I am happy for others to forward on email sent by me to public email lists.
Please ask my permission first if you wish to forward private email
to other parties.
_______________________________________________
Gpcg_talk mailing list
[email protected]
http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk