Data from the Antyodaya mission is already available on the Indiadataportal site... Not sure if that is complete or not - but you can check.
On Friday, 11 February 2022 at 05:34:36 UTC+5:30 [email protected] wrote: > Hello Seniors > From where i can get the city shap fie of Tarapur, Aurangabad, Nashik > this all cities are in Maharashtra State > > > Uzair > > On Mon, Feb 7, 2022 at 12:42 PM Piyush Kumar <[email protected]> wrote: > >> Thank you Sanjay and Nikhil. I think these are good starting points to >> try and figure out how to get this done and I am sure with some time and >> effort, it is possible. >> >> Piyush >> >> On Sun, 6 Feb 2022 at 17:48, Nikhil VJ <[email protected]> wrote: >> >>> Hi, >>> >>> I don't think Selenium is required - this looks like it can be done with >>> just varying the request payload of one POST api call. >>> POST api call to URL: >>> https://missionantyodaya.nic.in/preloginVillageInfrastructureReports2020.html >>> the POST request content type is application/x-www-form-urlencoded >>> >>> at *state level*, request payload is like: >>> stateCode: 27 >>> stateName: MAHARASHTRA >>> districtCode: >>> districtName: >>> blockCode: >>> blockName: >>> gpCode: >>> gpName: >>> >>> It* district level* it becomes: >>> stateCode: 27 >>> stateName: MAHARASHTRA >>> districtCode: 469 >>> districtName: AURANGABAD >>> blockCode: >>> blockName: >>> gpCode: >>> gpName: >>> >>> then *block level*: >>> stateCode: 27 >>> stateName: MAHARASHTRA >>> districtCode: 469 >>> districtName: AURANGABAD >>> blockCode: 4315 >>> blockName: KHULTABAD >>> gpCode: >>> gpName: >>> >>> then* GP level:* >>> stateCode: 27 >>> stateName: MAHARASHTRA >>> districtCode: 469 >>> districtName: AURANGABAD >>> blockCode: 4315 >>> blockName: KHULTABAD >>> gpCode: 170584 >>> gpName: BODKHA >>> >>> If in python, one can use Beautifulscrape to capture the table data as >>> well as get the (code + name) pairs for the next level. >>> >>> -- >>> Cheers, >>> Nikhil VJ >>> https://nikhilvj.co.in >>> >>> >>> On Fri, Feb 4, 2022 at 1:42 PM Sanjay Bhangar <[email protected]> >>> wrote: >>> >>>> Piyush - >>>> >>>> You could write a python (or your preferred language) script that just >>>> requests the HTML, parses it, and follows the hierarchy, without using >>>> selenium. This could be a bunch of work as the site doesn't use regular >>>> links with GET requests, but rather when you click on a state in the >>>> table, >>>> it uses Javascript to fill up hidden form fields with the state code, etc. >>>> and then does a form submit, causing a POST request to be made with those >>>> values. >>>> >>>> For eg. you can see the links in the table have an onClick handler like >>>> "selectState(2,'HIMACHAL >>>> PRADESH','preloginDistrictInfrastructureReports2020.html')" . >>>> >>>> Then, in the javascript, you can see the selectState function defined >>>> like so: >>>> >>>> function selectState(stateCode,stateName,action){ >>>> $("#stateCode").val(stateCode); >>>> $("#stateName").val(stateName); >>>> $("#reportForm").attr('action', action); >>>> $("#reportForm").submit(); >>>> >>>> } >>>> >>>> In this JS file: >>>> https://missionantyodaya.nic.in/resources/antyodaya/js/custom/prelogin/reports/preloginReport.js >>>> >>>> So this will make a POST request to >>>> preloginDistrictInfrastructureReports2020.html >>>> with stateCode=2, stateName=HIMACHAL PRADESH >>>> >>>> Similarly, there are different onCick handlers defined for selecting >>>> districts, etc. that you can follow down to see what URLs they are calling >>>> with what parameters. And in theory, you could write some HTML parsing >>>> code >>>> and some regex to go through the items in each table, parse out the >>>> parameters and URLs to call, and follow things down. >>>> >>>> So, in theory you could write this without mucking around with >>>> selenium, but it also seems like a lot more work than if the site was >>>> structured "normally" with unique URLs and GET requests. >>>> >>>> For the page numbering, this seems okay: the HTML outputs all the items >>>> across all the pages, and then the actual pagination on the page is purely >>>> client-side javascript - so if you were to read the HTML on the page via >>>> python or so, you would just get all the items in the table without having >>>> to worry about pagination. >>>> >>>> Unfortunately, this does seem like a lot of work and I don't really >>>> have the time to do anything, but it seemed like an interesting problem >>>> and >>>> I was curious so I took a look. Hope it could help a bit. >>>> >>>> All the best, >>>> Sanjay >>>> >>>> On Fri, Feb 4, 2022 at 1:03 PM Piyush Kumar <[email protected]> >>>> wrote: >>>> >>>>> Could folks here suggest how to go about this? >>>>> >>>>> >>>>> https://missionantyodaya.nic.in/preloginStateInfrastructureReports2020.html >>>>> >>>>> When we click this link, we get data on village-level infrastructure >>>>> put within multiple HTML tables across many pages (separated into state, >>>>> dist., block etc.) >>>>> >>>>> Suppose I want to scrape data upto the village level for a particular >>>>> state, is there any way I can get it done without too much back and forth >>>>> over Selenium webdriver? Please note that to access village level data >>>>> you >>>>> have to go through a nested hierarchy of links (gram panchyt within >>>>> block, >>>>> which is within a district and so on). To make matters more complicated, >>>>> the pages have also not been numbered. >>>>> >>>>> Can someone in the know help me figure this out? >>>>> >>>>> Thanks in advance >>>>> Piyush >>>>> >>>>> -- >>>>> Datameet is a community of Data Science enthusiasts in India. Know >>>>> more about us by visiting http://datameet.org >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "datameet" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com >>>>> >>>>> <https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> Datameet is a community of Data Science enthusiasts in India. Know more >>>> about us by visiting http://datameet.org >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "datameet" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com >>>> >>>> <https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/datameet/CAH7jeuNzEB%3DUVqgG0mYVtrKjWTHeAdN6d_%3DFnz9LLCsE4QH1eA%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/datameet/CAH7jeuNzEB%3DUVqgG0mYVtrKjWTHeAdN6d_%3DFnz9LLCsE4QH1eA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/datameet/CAFtOtduoUWJ6aQH69XfmUgnxXuQoJ1bRRMb1u-2Kznja9cSCtg%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/datameet/CAFtOtduoUWJ6aQH69XfmUgnxXuQoJ1bRRMb1u-2Kznja9cSCtg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/de5ff487-915d-4e4a-9830-cd3487535e5cn%40googlegroups.com.
