Hello Karthik and Nikhil,

On 05/06/19 8:52 pm, Nikhil VJ wrote:
> Hi Karthik,
> 
> Answering your second question: what's the best way to read the data? 
> 
> I recommend using Tabula - "for liberating data tables locked inside PDF
> files."
> https://tabula.technology/
> 
> With this you can go to a specific page, select the area of the table
> with mouse to exclude unnecessary things, and extract the data to a CSV. 
> 
> There are two ways it uses to extract, so be sure to try the other if
> one doesn't work out for you. 
> 
> And some manual work may be needed after extraction in case there's
> extra spaces, line-breaks in the header, etc.
> 
> Note: If the table you need is a scanned image in this PDF, then tabula
> is not applicable. Only works with vector data.
> 

While Tabula is a great tool to extract tabular data from PDFs (we rely
on it for quite a few tasks in my company), sometimes it fails to
correctly extract tabular data in a way that can be easily used by the
user. Another tool that we use, in such cases, is "Camelot: PDF Table
Extraction for Humans"

https://camelot-py.readthedocs.io/

I created a sample CSV document for the table on page 131 of the report
that you linked. Please find it attached.

Similar to Tabula, Camelot also has a web interface that you can use to
select particular area of the table and get a CSV.

https://www.tryexcalibur.com/

I hope this helps! Feel free to reach me if you have any queries and
would like some assistance along the way.

Cheers! :)

-- 
Dhanesh B. Sabane
https://dhanesh95.gitlab.io
PGP ID: 0xB69A98C9C1642329
Fingerprint: 9655 11F2 0D18 E76A 2396 D64D B69A 98C9 C164 2329

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/fdb0e626-aa56-c2d3-70d7-bb5a55a68eca%40disroot.org.
For more options, visit https://groups.google.com/d/optout.
"‘OBC,  while  for  SC  and  ‘others’,  WPR  remained  at  the  same  level  during  these  two","","","","",""
"periods.","","","","",""
"","Statement 43: Worker Population Ratio (WPR) (in per cent) according to usual status (ps+ss) for different social","","","",""
"","groups during  NSS 61st (2004-05), 66th (2009-10), 68th (2011-12) rounds and PLFS (2017-18)","","","",""
"","","","","","all- India"
"NSS rounds","","","household social group","",""
"(year)","ST","SC","OBC","Others","all (incl. n.r.)"
"(1)","(2)","(3)","(4)","(5)","(6)"
"","","rural male","","",""
"PLFS (2017-18)","53.8","52.3","50.5","52.2","51.7"
"68th(2011-12)","55.7","53.9","53.8","55.2","54.3"
"66th (2009-10)","55.9","54.8","54.0","55.2","54.7"
"61st (2004-05)","56.2","54.5","53.7","55.7","54.6"
"","","rural female","","",""
"PLFS (2017-18)","27.0","17.4","16.8","14.1","17.5"
"68th(2011-12)","36.4","26.2","23.9","20.1","24.8"
"66th (2009-10)","35.9","26.9","26.7","19.9","26.1"
"61st (2004-05)","46.4","33.3","33.0","26.2","32.7"
"","","urban male","","",""
"PLFS (2017-18)","49.9","52.5","53.2","53.1","53.0"
"68th(2011-12)","52.0","54.5","54.6","54.9","54.6"
"66th (2009-10)","51.0","55.0","54.3","54.2","54.3"
"61st (2004-05)","52.3","53.7","55.4","55.0","54.9"
"","","urban female","","",""
"PLFS (2017-18)","17.0","17.2","14.3","12.6","14.2"
"68th(2011-12)","19.2","17.2","15.1","12.9","14.7"
"66th (2009-10)","20.3","17.8","14.5","11.3","13.8"
"61st (2004-05)","24.5","20.0","18.5","13.4","16.6"
"","Note:  The figures are to be read along with the explanatory note for comparability.","","","",""

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to