Re: [CODE4LIB] Converting old tables in PDF to CSV

Amy Kirchhoff Tue, 21 Jun 2022 13:41:57 -0700

Andrea ~

I've not used it myself, but I have heard from others that do text analysis 
that Tesseract can well handle tabular data in scans: 
https://github.com/tesseract-ocr/tesseract


~ Amy

--
Amy J. Kirchhoff (she/her)
Constellate Text Analytics Business Manager / Portico, JSTOR 
Twitter: @AmyPlusFour

Find out about user interface releases, text analytics classes, and other 
updates in our email group (https://ithaka.groups.io/g/tdm-jstor-portico).

-----Original Message-----
From: Code for Libraries <[email protected]> On Behalf Of Medina-Smith, 
Andrea M. (Fed)
Sent: Tuesday, June 21, 2022 2:47 PM
To: [email protected]
Subject: [CODE4LIB] Converting old tables in PDF to CSV

>>>>>Caution: This message did not originate from within ITHAKA's email 
>>>>>system. Please use caution when opening attachments and following 
>>>>>links within this message.<<<<<

Hello List,

Has anyone had success converting tables in a PDF to CSV? These are scans of 
paper from the 70s on forward. I know this isn’t a super easy conversion, but I 
would think it’s not impossible either.

Thanks,
Andrea

--

Andrea Medina-Smith
Data Librarian
Information Services Office
National Institute of Standards and Technology 
[email protected]<mailto:[email protected]>
https://orcid.org/0000-0002-1217-701X

Re: [CODE4LIB] Converting old tables in PDF to CSV

Reply via email to