I request you all not to publicise this. Let us limit the discussions to 
technical matters and not about the data. Let us not put the data on any public 
domain. 

 

ECI has placed a lot of restrictions on the data to deny access. If they see 
the voter records being reconstructed and made public, that may not mean good 
to them. 

 

If anyone wants to discuss the issue further, please call me. 

 

Warm Regards,

PG

990 014 1232

 

From: Nikhil VJ [mailto:[email protected]] 
Sent: 25 November 2020 13:01
To: datameet
Cc: PG Bhat
Subject: Re: [datameet] Electoral Rolls Karnataka - Request for resources

 

Hi,

 

Just to update, I got in touch with Mr.PG and we have setup a workflow on a 
cloud server and it's chugging along nicely.

 

What the program does is cool - it implements a python library: ocrmypdf in 
bulk mode.

 

This description from their docs is what it's mainly doing:
OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF 
files, allowing them to be searched.

 

I made some tweaks to PG's program, have put it on github here: 
https://github.com/answerquest/bulk_pdf_OCR/

 

I think it may be useful at other places too.




--
Cheers,
Nikhil VJ
https://nikhilvj.co.in

 

 

On Tue, Nov 24, 2020 at 12:59 AM Anirudh K <[email protected]> wrote:

Hi all,

 

The Chief Electoral Officer - Karnataka has published a new version of 
Electoral Rolls. These are image based PDFs that have to be converted to text 
based PDFs.

 

There is a need for additional compute resources to convert these large files. 
If anyone would like help with this, the process would entail running a python 
script (already made) on Google Colab and sharing the output folder on Google 
Drive. A more technical description of the process is detailed below.

 

Please reach out to [email protected] (or call PG Bhat - 9900141232) to help 
out with this project, or in case of any queries. 

 

The full process:

1.      Create a shared folder on Drive called 'ERMS' and give edit access to 
[email protected].
2.      He will create 3 subfolders:

*       Code - This will contain the script. There is no need for any software 
to be installed locally.
*       Image files - This houses the image files
*       Text files - where the script will write the results

3.      Run the script on Colab (free account). The text files can then be 
downloaded from the drive folder

Thank you for considering this request.

 

Regards,

Anirudh

 

 

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/be9e4621-03a6-4e7e-8dfd-51ab93478b4en%40googlegroups.com
 
<https://groups.google.com/d/msgid/datameet/be9e4621-03a6-4e7e-8dfd-51ab93478b4en%40googlegroups.com?utm_medium=email&utm_source=footer>
 .

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/5fbe0ddb.1c69fb81.d5cb0.4c04%40mx.google.com.

Reply via email to