Kevin,

This may or may not be what you are looking for, and definitely a solution only 
an engineer would think of.  In JavaScript/jQuery an image can be uploaded into 
an html canvas form (maybe .pdf also, not sure) and displayed inside the 
browser (can be saved if needed).  Using Google Visions AI API .txt can be 
pulled from the graphic that is being displayed in the browser.  My experience 
has been that the OCR results are very very good.

This may or may not be a solution.

Brent Fergsuon, MLS
Librarian, Elkhart Public Library
https://myepl.org
________________________________
From: Code for Libraries <[email protected]> on behalf of Kevin 
Schlottmann <[email protected]>
Sent: Tuesday, June 16, 2020 12:48 PM
To: [email protected] <[email protected]>
Subject: [CODE4LIB] Zonal OCR for catalog cards

Hi all,

As we get deeper into our work-from-home projects, we are getting to 
collections that were richly described using catalog cards, long before 
computerized systems for discovery were adopted.  Our card scanner allows us to 
quickly convert these cards to PDF, but rather than copying-and-pasting the 
text, I'm hoping to go a step further and get structured data off of them.

I'm wondering if anyone here has ever leveraged zonal OCR, such as the kind 
used for business cards or invoices, to break out OCRed data in catalog cards. 
I did a quick Google and a search in the archives here, but didn't see anything 
right off the bat.  I think the basic tools for throwing something together are 
all there, but I'm hoping someone has already explored this and stitched 
something together.

Kevin

---

Kevin Schlottmann
Head of Archives Processing
Rare Book & Manuscript Library
Butler Library, Room 801
Columbia University
535 W. 114th St., New York, NY  10027
(212) 854-8483

Reply via email to