I request you all not to publicise this. Let us limit the discussions to technical matters and not about the data. Let us not put the data on any public domain.
ECI has placed a lot of restrictions on the data to deny access. If they see the voter records being reconstructed and made public, that may not mean good to them. If anyone wants to discuss the issue further, please call me. Warm Regards, PG 990 014 1232 From: Nikhil VJ [mailto:[email protected]] Sent: 25 November 2020 13:01 To: datameet Cc: PG Bhat Subject: Re: [datameet] Electoral Rolls Karnataka - Request for resources Hi, Just to update, I got in touch with Mr.PG and we have setup a workflow on a cloud server and it's chugging along nicely. What the program does is cool - it implements a python library: ocrmypdf in bulk mode. This description from their docs is what it's mainly doing: OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. I made some tweaks to PG's program, have put it on github here: https://github.com/answerquest/bulk_pdf_OCR/ I think it may be useful at other places too. -- Cheers, Nikhil VJ https://nikhilvj.co.in On Tue, Nov 24, 2020 at 12:59 AM Anirudh K <[email protected]> wrote: Hi all, The Chief Electoral Officer - Karnataka has published a new version of Electoral Rolls. These are image based PDFs that have to be converted to text based PDFs. There is a need for additional compute resources to convert these large files. If anyone would like help with this, the process would entail running a python script (already made) on Google Colab and sharing the output folder on Google Drive. A more technical description of the process is detailed below. Please reach out to [email protected] (or call PG Bhat - 9900141232) to help out with this project, or in case of any queries. The full process: 1. Create a shared folder on Drive called 'ERMS' and give edit access to [email protected]. 2. He will create 3 subfolders: * Code - This will contain the script. There is no need for any software to be installed locally. * Image files - This houses the image files * Text files - where the script will write the results 3. Run the script on Colab (free account). The text files can then be downloaded from the drive folder Thank you for considering this request. Regards, Anirudh -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/be9e4621-03a6-4e7e-8dfd-51ab93478b4en%40googlegroups.com <https://groups.google.com/d/msgid/datameet/be9e4621-03a6-4e7e-8dfd-51ab93478b4en%40googlegroups.com?utm_medium=email&utm_source=footer> . -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/5fbe0ddb.1c69fb81.d5cb0.4c04%40mx.google.com.
