Oops the vowls issues are the same as Nikhil pointed in https://github.com/tabulapdf/tabula/issues/303 <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Ftabulapdf%2Ftabula%2Fissues%2F303&sa=D&sntz=1&usg=AFQjCNGlHVhSmPVepBCIh7BC-icjqHiIhg>. Sorry I have, very limited know on Devanagari fonts.
On Sunday, August 20, 2017 at 8:55:01 PM UTC+5:30, nishadh wrote: > > Hi, > > There is a python based wrapper for Tabula > https://github.com/chezou/tabula-py. It converts pdf tables into pandas > dataframe. I tried with a sample electrol role pdf from > https://ceo.maharashtra.gov.in/Search/SearchPDF.aspx and it does > converted a single page table into pandas data frame. It has to use > encoding with 'utf-8' to convert the dataframe output into csv. In Jupyter > notebook and csv file, the devnagiri fonts were as similar as in the pdf, > however I could find vowels are missing in the print, a close observation > could sort this. May be pre processing the pdf with conversion into single > pages(it is mandatory, taking few seconds for even single page) or single > electrol entry table cropping could fetch better results, for that library > pyPdf is good help. > > > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.