Does the processed text need to be fault-less (ie needing human checking)?
We scan about 6,000 pages per year with human checking of the ocr'd text. Our process wouldn't scale very well, and even at this low end of the scale it is very labour intensive. For a batch of 100,000 you would need high software such as used by the large law firms to construct their knowledgebases. Tony Eviston G.P. Boonah Medical Centre As an aside - wouldn't they (the legal profession) love to get their hands on 100,000 medical documents for trawling purposes! I think I'll hand write my medico-legal reports from now on. A quick search on our 58,000 documents for the strings "unfortunately" + "hospital" returned 491 hits including the likes of the following: Unfortunately this has resulted in confusion as to who to contact regarding his care. Unfortunately the waiting period for varicose vein surgery at xxxxxx Hospital is several years 1 note that his cochlear implant late last year has unfortunately not been successful Unfortunately the specimen has gone astray. Unfortunately there was a modest amount of distal embolisation into the distal obtuse marginal artery Unfortunately this MRI was not compared to the previous MRI done in November. etc [EMAIL PROTECTED] wrote: > I would also like to know what people are doing in this quarter. We are > looking at some research projects where we might be OCRing 100,000 pages > (already scanned), so any simple efficienies will be worhtwhile. > thanks > jon patrick > Quoting Ian Cheong <[EMAIL PROTECTED]>: > >> I would like to know if anybody is scanning to OCR'd pdfs, or knows >> anyone in GP land who is doing so. >> >> I'd prefer to do this to scanning only images or only OCR, because >> the content is then text searchable with background image retained >> for "medicolegal" puposes. >> >> There seem to be several commercial products ranging from 100s to >> 1000s of dollars that will do this - interested to know what products >> work well. >> >> >> Ian. >> >> -- >> Dr Ian R Cheong, BMedSc, FRACGP, GradDipCompSc, MBA(Exec) >> Health Informatics Consultant, Brisbane, Australia >> Internet: [EMAIL PROTECTED] >> (for urgent matters, please send a copy to my practice email as well: >> [EMAIL PROTECTED]) >> >> PRIVACY NOTE >> I am happy for others to forward on email sent by me to public email >> lists. >> Please ask my permission first if you wish to forward private email >> to other parties. >> _______________________________________________ >> Gpcg_talk mailing list >> [email protected] >> http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk >> > > > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > _______________________________________________ > Gpcg_talk mailing list > [email protected] > http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk > > _______________________________________________ Gpcg_talk mailing list [email protected] http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk
