No we are not going to human check everything, but we will do a trial run to "tune" the OCR beofer doing the full batch. thanks jon Quoting Tony Eviston <[EMAIL PROTECTED]>:
> > > Does the processed text need to be fault-less (ie needing human > checking)? > > We scan about 6,000 pages per year with human checking of the ocr'd > text. Our process wouldn't scale very well, and even at this low end of > the scale it is very labour intensive. > > For a batch of 100,000 you would need high software such as used by the > large law firms to construct their knowledgebases. > > > Tony Eviston > G.P. > Boonah Medical Centre > > > > As an aside - wouldn't they (the legal profession) love to get their > hands on 100,000 medical documents for trawling purposes! I think I'll > hand write my medico-legal reports from now on. > A quick search on our 58,000 documents for the strings "unfortunately" + > "hospital" returned 491 hits including the likes of the following: > > Unfortunately this has resulted in confusion as to who to contact > regarding his care. > > Unfortunately the waiting period for varicose vein surgery at xxxxxx > Hospital is several years > > 1 note that his cochlear implant late last year has unfortunately not > been successful > > Unfortunately the specimen has gone astray. > > Unfortunately there was a modest amount of distal embolisation into the > distal obtuse marginal artery > > Unfortunately this MRI was not compared to the previous MRI done in > November. > > etc > > > [EMAIL PROTECTED] wrote: > > I would also like to know what people are doing in this quarter. We > are > > looking at some research projects where we might be OCRing 100,000 > pages > > (already scanned), so any simple efficienies will be worhtwhile. > > thanks > > jon patrick > > Quoting Ian Cheong <[EMAIL PROTECTED]>: > > > >> I would like to know if anybody is scanning to OCR'd pdfs, or knows > >> anyone in GP land who is doing so. > >> > >> I'd prefer to do this to scanning only images or only OCR, because > >> the content is then text searchable with background image retained > >> for "medicolegal" puposes. > >> > >> There seem to be several commercial products ranging from 100s to > >> 1000s of dollars that will do this - interested to know what products > >> work well. > >> > >> > >> Ian. > >> > >> -- > >> Dr Ian R Cheong, BMedSc, FRACGP, GradDipCompSc, MBA(Exec) > >> Health Informatics Consultant, Brisbane, Australia > >> Internet: [EMAIL PROTECTED] > >> (for urgent matters, please send a copy to my practice email as well: > >> [EMAIL PROTECTED]) > >> > >> PRIVACY NOTE > >> I am happy for others to forward on email sent by me to public email > >> lists. > >> Please ask my permission first if you wish to forward private email > >> to other parties. > >> _______________________________________________ > >> Gpcg_talk mailing list > >> [email protected] > >> http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk > >> > > > > > > > > > > ---------------------------------------------------------------- > > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > > Gpcg_talk mailing list > > [email protected] > > http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk > > > > > > _______________________________________________ > Gpcg_talk mailing list > [email protected] > http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. _______________________________________________ Gpcg_talk mailing list [email protected] http://ozdocit.org/cgi-bin/mailman/listinfo/gpcg_talk
