Hello Karthik and Nikhil, On 05/06/19 8:52 pm, Nikhil VJ wrote: > Hi Karthik, > > Answering your second question: what's the best way to read the data? > > I recommend using Tabula - "for liberating data tables locked inside PDF > files." > https://tabula.technology/ > > With this you can go to a specific page, select the area of the table > with mouse to exclude unnecessary things, and extract the data to a CSV. > > There are two ways it uses to extract, so be sure to try the other if > one doesn't work out for you. > > And some manual work may be needed after extraction in case there's > extra spaces, line-breaks in the header, etc. > > Note: If the table you need is a scanned image in this PDF, then tabula > is not applicable. Only works with vector data. >
While Tabula is a great tool to extract tabular data from PDFs (we rely on it for quite a few tasks in my company), sometimes it fails to correctly extract tabular data in a way that can be easily used by the user. Another tool that we use, in such cases, is "Camelot: PDF Table Extraction for Humans" https://camelot-py.readthedocs.io/ I created a sample CSV document for the table on page 131 of the report that you linked. Please find it attached. Similar to Tabula, Camelot also has a web interface that you can use to select particular area of the table and get a CSV. https://www.tryexcalibur.com/ I hope this helps! Feel free to reach me if you have any queries and would like some assistance along the way. Cheers! :) -- Dhanesh B. Sabane https://dhanesh95.gitlab.io PGP ID: 0xB69A98C9C1642329 Fingerprint: 9655 11F2 0D18 E76A 2396 D64D B69A 98C9 C164 2329 -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/fdb0e626-aa56-c2d3-70d7-bb5a55a68eca%40disroot.org. For more options, visit https://groups.google.com/d/optout.
"âOBC, while for SC and âothersâ, WPR remained at the same level during these two","","","","","" "periods.","","","","","" "","Statement 43: Worker Population Ratio (WPR) (in per cent) according to usual status (ps+ss) for different social","","","","" "","groups during NSS 61st (2004-05), 66th (2009-10), 68th (2011-12) rounds and PLFS (2017-18)","","","","" "","","","","","all- India" "NSS rounds","","","household social group","","" "(year)","ST","SC","OBC","Others","all (incl. n.r.)" "(1)","(2)","(3)","(4)","(5)","(6)" "","","rural male","","","" "PLFS (2017-18)","53.8","52.3","50.5","52.2","51.7" "68th(2011-12)","55.7","53.9","53.8","55.2","54.3" "66th (2009-10)","55.9","54.8","54.0","55.2","54.7" "61st (2004-05)","56.2","54.5","53.7","55.7","54.6" "","","rural female","","","" "PLFS (2017-18)","27.0","17.4","16.8","14.1","17.5" "68th(2011-12)","36.4","26.2","23.9","20.1","24.8" "66th (2009-10)","35.9","26.9","26.7","19.9","26.1" "61st (2004-05)","46.4","33.3","33.0","26.2","32.7" "","","urban male","","","" "PLFS (2017-18)","49.9","52.5","53.2","53.1","53.0" "68th(2011-12)","52.0","54.5","54.6","54.9","54.6" "66th (2009-10)","51.0","55.0","54.3","54.2","54.3" "61st (2004-05)","52.3","53.7","55.4","55.0","54.9" "","","urban female","","","" "PLFS (2017-18)","17.0","17.2","14.3","12.6","14.2" "68th(2011-12)","19.2","17.2","15.1","12.9","14.7" "66th (2009-10)","20.3","17.8","14.5","11.3","13.8" "61st (2004-05)","24.5","20.0","18.5","13.4","16.6" "","Note: The figures are to be read along with the explanatory note for comparability.","","","",""
signature.asc
Description: OpenPGP digital signature
