We downloaded it all manually. However unable to merge them. The text is in Kannada, and may be unclear even if we are able to extract it. We have anyway merged some data manually, from these 190+ files to existing data we had already prepared. Digitizing it all starting from the affidavit filing stage would have helped us greatly but sadly BBMP or EC or SEC doesn't have that system.
On 18 August 2015 at 22:03, Bhanu Kamapantula <[email protected]> wrote: > Hi Shree, > > You would want to write a script which can scrape the data from the > website. This can be automated using Python, mechanize library (with > support for doPostBack calls as in these webpages). > > Once downloaded, PDFs can be combined using PDFtk library (one among > different methods). Then, XPDF might be useful to retrieve text from the > combined PDF. > > best, > Bhanu > > On Tue, Aug 18, 2015 at 4:01 AM, Shree D N <[email protected]> wrote: > >> This link has form 7 (list of candidates) for all contesting candidates >> for BBMP polls 2015. Typically all pdfs, scanned and uploaded. Language: >> Kannada >> http://117.247.176.82/ >> Is there a way to download it all and merge into one document or a table >> that represents the list of all candidates for all wards?? >> We are trying to put this together because this consolidated list is not >> available anywhere as far as I know. Has anyone else seen it? >> -- >> ------- >> Cheers, >> >> *Shree | Associate Editor | * >> *Oorvani Foundation**Citizen Matters >> <http://bangalore.citizenmatters.in> - Bangalore's own online news magazine* >> Bangalore | Tel: +91-80-4173 7584 | Mobile: +91-95909 35559 >> Follow us on Twitter <https://twitter.com/citizenmatters> | Follow us on >> Facebook <https://www.facebook.com/citizenmatters> >> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Bhanu > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- ------- Cheers, *Shree | Associate Editor | * *Oorvani Foundation**Citizen Matters <http://bangalore.citizenmatters.in> - Bangalore's own online news magazine* Bangalore | Tel: +91-80-4173 7584 | Mobile: +91-95909 35559 Follow us on Twitter <https://twitter.com/citizenmatters> | Follow us on Facebook <https://www.facebook.com/citizenmatters> -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
