On 1 June 2014 11:16, Venkata Pingali <[email protected]> wrote: > > As it turns out, I am working on a DSL for data extraction for > a client - pretty much what you said with some nuances. The > client is open-source friendly and I will request for open > sourcing the tooling. >
The PDF specification is huge, and parsing a PDF is no simple task. We use Python, and have had decent luck with pdfminer for parsing text PDFs. The documentation is a little sketchy, but one can find blogs that describe how pdfminer can be used. Regards, Gora -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
