I looked at this site some months back. Notes from my investigation then: 1. The content/jsons are being served by commercial GIS operated by BSNL 2. The content is coming back split across many JSONs (50-60) with encoded URLs 3. API appeared stateful (URLs kept changing)
This, I concluded, needs a (a) phantom-like programmable browser control (b) network proxy/sniffer to dump all intercepted content (c) wrangling to relate data from across all the jsons I also believe that there are license issues. On May 21, 2016 3:51 PM, "Raphael Susewind" <[email protected]> wrote: > Hi Nikhil, > > most likely the flash application loads something like a JSON (or CSV, > if they are bad programmers ;-) ) from a specified API address. Use a > network sniffer to intercept the traffic that the flashplayer generates, > and see whether you can replicate the API. > > If you are lucky, you will see HTTP requests to an URL along the lines > of http://schoolgis.nic.in/state_x/data.json?school=0000001 to > 000014986. In that case, you can then manually scrape the JSON files (if > need be by emulating a flashplayer's HTTP headers, though I doubt that > they check for this). > > If you are unlucky, its a more complex API - some stateful frontends for > SQL databases can be very nasty to replicate, for instance. One brute > force kind of solution in such cases would be to write a custom proxy > server (there are python/perl/... modules for this) - i.e. a kind of > customized sniffer - and route your browser traffic through this, then > automate the browser (again, there are plugins for firefox and chrome > that have corresponding python or perl interfaces), and intercept the > traffic generated. That's the solution I found to scrape polling station > localities from the ECI server (before they put a bold copyright > disclaimer on it - now this kind of scraping would probably be illegal - > so do check these issues as well). > > Let us know what you find out about the API, > > Best of luck, > Raphael > > On 21.05.2016 11:43, Nikhil VJ wrote: > > > Hi friends, > > > > is their any way to extract data from such a flash player platform...as > > follows... > > > > schoolgis.nic.in <http://schoolgis.nic.in/> > > > > --regards, > > Nikhil VJ > > Pune > > > > -- > > Datameet is a community of Data Science enthusiasts in India. Know more > > about us by visiting http://datameet.org > > --- > > You received this message because you are subscribed to the Google > > Groups "datameet" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to [email protected] > > <mailto:[email protected]>. > > For more options, visit https://groups.google.com/d/optout. > > -- > Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford > Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany > Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind > Impact | https://impactstory.org/raphael-susewind > > Please consider https://www.gnupg.org for encryption (key id 10AEE42F) > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
