[datameet] Re: Need some Guidence on Parsing Electoral Roles.

nishadh Sun, 20 Aug 2017 08:41:28 -0700

Oops the vowls issues are the same as Nikhil pointed in 
https://github.com/tabulapdf/tabula/issues/303 
<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Ftabulapdf%2Ftabula%2Fissues%2F303&sa=D&sntz=1&usg=AFQjCNGlHVhSmPVepBCIh7BC-icjqHiIhg>.
 
Sorry I have, very limited know on Devanagari fonts.


On Sunday, August 20, 2017 at 8:55:01 PM UTC+5:30, nishadh wrote:
>
> Hi,
>
> There is a python based wrapper for Tabula 
> https://github.com/chezou/tabula-py. It converts pdf tables into pandas 
> dataframe. I tried with a sample electrol role pdf from 
> https://ceo.maharashtra.gov.in/Search/SearchPDF.aspx and it does 
> converted a single page table into pandas data frame. It has to use 
> encoding with 'utf-8' to convert the dataframe output into csv. In Jupyter 
> notebook and csv file, the devnagiri fonts were as similar as in the pdf, 
> however I could find vowels are missing in the print, a close observation 
> could sort this. May be pre processing the pdf with conversion into single 
> pages(it is mandatory, taking few seconds for even single page) or single 
> electrol entry table cropping could fetch better results, for that library 
> pyPdf is good help.  
>
>   
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[datameet] Re: Need some Guidence on Parsing Electoral Roles.

Reply via email to