@Anand : You might find this useful aswell.

https://github.com/datameet/india-election-data

Lot of PDF's have been parsed and dumped here.

Sample :
http://nbviewer.ipython.org/github/datameet/india-election-data/blob/master/parliament-elections/election.ipynb

Regards
Konark


On Sat, Mar 1, 2014 at 5:53 PM, Anand Chitipothu <[email protected]>wrote:

> On Tue, Feb 18, 2014 at 2:59 PM, Raphael Susewind <
> [email protected]> wrote:
>
>> Hey everybody,
>>
>> I am working on PDF electoral rolls, but struggle with unicode
>> conversion issues (a Crystal Reports bug in the version the ECI
>> currently uses, at least in some states such as UP or Gujarat, which
>> leads to a corrupted ToUnicodeCMap, which means you cannot properly copy
>> and paste from the PDF, or otherwise extract proper UTF8). If your 'free
>> the pdf event' finds a way around this, do let me know - likewise I
>> shall send any progress from my side...
>>
>
> For generating list of Polling Booths, I gave up parsing Kannada PDFs and
> used the polling booth names specified in Kannada on the website. I've
> transliterated the names using unidecode python library and replaced some
> common words.
>
> For example:
>
> http://ge2014.anandology.com/KA/AC001
>
> Anand
>
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to