@Anand : You might find this useful aswell. https://github.com/datameet/india-election-data
Lot of PDF's have been parsed and dumped here. Sample : http://nbviewer.ipython.org/github/datameet/india-election-data/blob/master/parliament-elections/election.ipynb Regards Konark On Sat, Mar 1, 2014 at 5:53 PM, Anand Chitipothu <[email protected]>wrote: > On Tue, Feb 18, 2014 at 2:59 PM, Raphael Susewind < > [email protected]> wrote: > >> Hey everybody, >> >> I am working on PDF electoral rolls, but struggle with unicode >> conversion issues (a Crystal Reports bug in the version the ECI >> currently uses, at least in some states such as UP or Gujarat, which >> leads to a corrupted ToUnicodeCMap, which means you cannot properly copy >> and paste from the PDF, or otherwise extract proper UTF8). If your 'free >> the pdf event' finds a way around this, do let me know - likewise I >> shall send any progress from my side... >> > > For generating list of Polling Booths, I gave up parsing Kannada PDFs and > used the polling booth names specified in Kannada on the website. I've > transliterated the names using unidecode python library and replaced some > common words. > > For example: > > http://ge2014.anandology.com/KA/AC001 > > Anand > > -- > For more details about this list > http://datameet.org/discussions/ > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- For more details about this list http://datameet.org/discussions/ --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
