Hi Shree, You would want to write a script which can scrape the data from the website. This can be automated using Python, mechanize library (with support for doPostBack calls as in these webpages).
Once downloaded, PDFs can be combined using PDFtk library (one among different methods). Then, XPDF might be useful to retrieve text from the combined PDF. best, Bhanu On Tue, Aug 18, 2015 at 4:01 AM, Shree D N <shre...@oorvani.in> wrote: > This link has form 7 (list of candidates) for all contesting candidates > for BBMP polls 2015. Typically all pdfs, scanned and uploaded. Language: > Kannada > http://117.247.176.82/ > Is there a way to download it all and merge into one document or a table > that represents the list of all candidates for all wards?? > We are trying to put this together because this consolidated list is not > available anywhere as far as I know. Has anyone else seen it? > -- > ------- > Cheers, > > *Shree | Associate Editor | * > *Oorvani Foundation**Citizen Matters <http://bangalore.citizenmatters.in> > - Bangalore's own online news magazine* > Bangalore | Tel: +91-80-4173 7584 | Mobile: +91-95909 35559 > Follow us on Twitter <https://twitter.com/citizenmatters> | Follow us on > Facebook <https://www.facebook.com/citizenmatters> > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to datameet+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Bhanu -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.