Is your target text in text form or image form? If text: http://tabula.technology/
If image: http://www.i2ocr.com/free-online-kannada-ocr If image with handwritten text : interns :P -- Cheers, Nikhil +91-966-583-1250 Pune, India Self-designed learner at Swaraj University <http://www.swarajuniversity.org> http://nikhilsheth.blogspot.in On Wed, Aug 19, 2015 at 9:40 AM, Shree D N <shre...@oorvani.in> wrote: > We downloaded it all manually. However unable to merge them. The text is > in Kannada, and may be unclear even if we are able to extract it. We have > anyway merged some data manually, from these 190+ files to existing data we > had already prepared. > Digitizing it all starting from the affidavit filing stage would have > helped us greatly but sadly BBMP or EC or SEC doesn't have that system. > > On 18 August 2015 at 22:03, Bhanu Kamapantula <talk2k...@gmail.com> wrote: > >> Hi Shree, >> >> You would want to write a script which can scrape the data from the >> website. This can be automated using Python, mechanize library (with >> support for doPostBack calls as in these webpages). >> >> Once downloaded, PDFs can be combined using PDFtk library (one among >> different methods). Then, XPDF might be useful to retrieve text from the >> combined PDF. >> >> best, >> Bhanu >> >> On Tue, Aug 18, 2015 at 4:01 AM, Shree D N <shre...@oorvani.in> wrote: >> >>> This link has form 7 (list of candidates) for all contesting candidates >>> for BBMP polls 2015. Typically all pdfs, scanned and uploaded. Language: >>> Kannada >>> http://117.247.176.82/ >>> Is there a way to download it all and merge into one document or a table >>> that represents the list of all candidates for all wards?? >>> We are trying to put this together because this consolidated list is not >>> available anywhere as far as I know. Has anyone else seen it? >>> -- >>> ------- >>> Cheers, >>> >>> *Shree | Associate Editor | * >>> *Oorvani Foundation**Citizen Matters >>> <http://bangalore.citizenmatters.in> - Bangalore's own online news magazine* >>> Bangalore | Tel: +91-80-4173 7584 | Mobile: +91-95909 35559 >>> Follow us on Twitter <https://twitter.com/citizenmatters> | Follow us >>> on Facebook <https://www.facebook.com/citizenmatters> >>> >>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to datameet+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Bhanu >> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to datameet+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > ------- > Cheers, > > *Shree | Associate Editor | * > *Oorvani Foundation**Citizen Matters <http://bangalore.citizenmatters.in> > - Bangalore's own online news magazine* > Bangalore | Tel: +91-80-4173 7584 | Mobile: +91-95909 35559 > Follow us on Twitter <https://twitter.com/citizenmatters> | Follow us on > Facebook <https://www.facebook.com/citizenmatters> > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to datameet+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.