Hi Shree,

You would want to write a script which can scrape the data from the
website. This can be automated using Python, mechanize library (with
support for doPostBack calls as in these webpages).

Once downloaded, PDFs can be combined using PDFtk library (one among
different methods). Then, XPDF might be useful to retrieve text from the
combined PDF.

best,
Bhanu

On Tue, Aug 18, 2015 at 4:01 AM, Shree D N <shre...@oorvani.in> wrote:

> This link has form 7 (list of candidates) for all contesting candidates
> for BBMP polls 2015. Typically all pdfs, scanned and uploaded. Language:
> Kannada
> http://117.247.176.82/
> Is there a way to download it all and merge into one document or a table
> that represents the list of all candidates for all wards??
> We are trying to put this together because this consolidated list is not
> available anywhere as far as I know. Has anyone else seen it?
> --
> -------
> Cheers,
>
> *Shree | Associate Editor | *
> *Oorvani Foundation**Citizen Matters <http://bangalore.citizenmatters.in>
> - Bangalore's own online news magazine*
> Bangalore | Tel: +91-80-4173 7584 | Mobile: +91-95909 35559
> Follow us on Twitter <https://twitter.com/citizenmatters> | Follow us on
> Facebook <https://www.facebook.com/citizenmatters>
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Bhanu

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to