Hi, I'm looking at Maharashtra's land records portal : https://mahabhulekh.maharashtra.gov.in
.. and wondering if it's possible to scrape data from here? Will share a workflow: choose 7/12 (७/१२) > select any जिल्हा > तालुका > गाव select शोध : सर्वे नंबर / गट नंबर (first option) type 1 in the text box and press the "शोधा" button Then we get a dropdown with options like 1/1 , 1/2, 1/3 etc. On selecting any and clicking "७/१२ पहा", a new window/tab opens up (you have to enable popups), having static HTML content (some tables). I need to capture this content. The URL is always the same: https://mahabhulekh.maharashtra.gov.in/Konkan/pg712.aspx ..but the content changes depending on the options chosen. On using the browser's "Inspect Element"> Network and clicking the final button, there is a request to this URL: https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712 and the request Params / Payload is like: {'sno':'1','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} when you change the survey/gat number to 1/10, the params change like so: {'sno':'1#10','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} for 1/1अ: {'sno':'1#1अ','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} I tried some wget and curl commands but no luck so far. Do let me know if you can make some headway. Also, it would be great to also learn how to extract on the list of districts, talukas (subdistricts) in each district, and villages in each taluka. dumping other info at bottom if it helps. Why do this: At present it's just an exploration following on from our work on village shapefiles. The district > taluka > village mapping data from official Land Records data could serve as a good source for triangulation. Then, while I don't see myself going deeper into this right now, I am aware that land records / ownership has major corruption, entanglements and other issues precisely because of the lack of transparency. The mahabhulekh website itself is a significant step forward in making this sector a little more transparent, and more push in this direction would probably do more good IMHO. At some point GIS/lat-long info might come in, and it would be good to bring the data to a level that is ready for it. Data dump: When we press the button to fetch the 7/12 (saatbarah) record, the console records a POST with these parameters: Copy as cURL: curl 'https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712' -H 'Host: mahabhulekh.maharashtra.gov.in' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0' -H 'Accept: application/json, text/plain, */*' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Content-Type: application/json;charset=utf-8' -H 'Referer: https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx' -H 'Content-Length: 170' -H 'Cookie: ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc' -H 'Connection: keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' Copy POST data: {'sno':'1#1अ','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} request headers: POST /Konkan/Home.aspx/call712 HTTP/1.1 Host: mahabhulekh.maharashtra.gov.in User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0 Accept: application/json, text/plain, */* Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Content-Type: application/json;charset=utf-8 Referer: https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx Content-Length: 170 Cookie: ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc Connection: keep-alive Pragma: no-cache Cache-Control: no-cache response headers: HTTP/1.1 200 OK Cache-Control: private, max-age=0 Content-Type: application/json; charset=utf-8 Server: Microsoft-IIS/8.0 X-Powered-By: ASP.NET Date: Mon, 24 Oct 2016 15:31:40 GMT Content-Length: 10 Copy Response: {"d":null} -- -- Cheers, Nikhil +91-966-583-1250 Pune, India Self-designed learner at Swaraj University <http://www.swarajuniversity.org> Blog <http://nikhilsheth.blogspot.in> | Contribute <https://www.payumoney.com/webfronts/#/index/NikhilVJ> -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
