Hi Karthik,

Answering your second question: what's the best way to read the data? 

I recommend using Tabula - "for liberating data tables locked inside PDF 
files."
https://tabula.technology/

With this you can go to a specific page, select the area of the table with 
mouse to exclude unnecessary things, and extract the data to a CSV. 

There are two ways it uses to extract, so be sure to try the other if one 
doesn't work out for you. 

And some manual work may be needed after extraction in case there's extra 
spaces, line-breaks in the header, etc.

Note: If the table you need is a scanned image in this PDF, then tabula is 
not applicable. Only works with vector data.


Regards,
Nikhil VJ, Pune, India
https://nikhilvj.co.in/


On Wednesday, June 5, 2019 at 6:17:54 PM UTC+5:30, Karthik Shashidhar wrote:
>
> As you would expect from the Indian government, while the Periodic Labour 
> Force Study for 2017-18 was finally released, it's been released as a 
> single PDF. Has anyone succeeded in converting all the tables into an easy 
> to read format such as CSV or excel? 
>
>
> http://mospi.nic.in/sites/default/files/publication_reports/Annual%20Report%2C%20PLFS%202017-18_31052019.pdf
>
>
> And if nobody has managed to get the data into excel yet, what's the best 
> way to read the data? 
>
> Thanks
> Karthik
>
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/0ea0ada6-93a8-4bc9-ae33-4c790e4e7838%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to