On 1 June 2014 11:16, Venkata Pingali <[email protected]> wrote:
>
> As it turns out, I am working on a DSL for data extraction for
> a client - pretty much what you said with some nuances. The
> client is open-source friendly and I will request for open
> sourcing the tooling.
>

The PDF specification is huge, and parsing a PDF is no simple task.
We use Python, and have had decent luck with pdfminer for parsing
text PDFs. The documentation is a little sketchy, but one can find blogs
that describe how pdfminer can be used.

Regards,
Gora

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to