Hi Nikhil, I have been thinking on similar lines to work in Telangana(http://mabhoomi.telangana.gov.in/) and have spoken to local land activists & researchers. Why this ? , one is to keep a record dump of records as they are changing very fastly in Telangana with huge amount of surveys done, and we have no clue about how the records are changing, and only the final changes are in public domain. Second we are running a farmer distress helpline since the last 7 months in Vikarabad District, Telangana and 50% of the issues we get are land issues, so it would make the accessibility of the records easy too. Third is also to understand and do some analysis on the land acregae, and who owns it, who cultivates/benefits from it(its currently noted in pahani in 13th column). As we have been working on rights of Tenant farmers, this is an important data point and understanding we need to get.
So we would be eager to know on how we can collaborate and take it forward. We can take help from Srinivas Kodali([email protected]>) , who had locally offerred to help, who has experience in scrapping. Cheers, SreeHarsha On Monday, October 23, 2017 at 2:55:35 PM UTC+5:30, Devendra Damle wrote: > > Hi Nikhil. > > A colleague of mine wrote a python script for scraping data from the Debt > Recovery Tribunals website. The problem was similar to yours. > > His script uses selenium web driver, and gecko drivers for firefox. It > opens the website in firefox, then simulates clicks to select things from > drop-down menus to generate tables, and then downloads the data in it to a > JSON file. I am attaching the source code file. I am myself not a coder, so > I won't be able to help you with the code itself, but you might be able to > modify it to suit your needs. > > Regards, > Devendra > > On Monday, October 24, 2016 at 10:09:58 PM UTC+5:30, Nikhil VJ wrote: >> >> Hi, >> >> I'm looking at Maharashtra's land records portal : >> https://mahabhulekh.maharashtra.gov.in >> >> .. and wondering if it's possible to scrape data from here? >> >> Will share a workflow: >> choose 7/12 (७/१२) > select any जिल्हा > तालुका > गाव >> select शोध : सर्वे नंबर / गट नंबर (first option) >> type 1 in the text box and press the "शोधा" button >> Then we get a dropdown with options like 1/1 , 1/2, 1/3 etc. >> >> On selecting any and clicking "७/१२ पहा", >> a new window/tab opens up (you have to enable popups), having static >> HTML content (some tables). I need to capture this content. >> >> The URL is always the same: >> https://mahabhulekh.maharashtra.gov.in/Konkan/pg712.aspx >> ..but the content changes depending on the options chosen. >> >> On using the browser's "Inspect Element"> Network and clicking the >> final button, there is a request to this URL: >> >> https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712 >> >> and the request Params / Payload is like: >> >> {'sno':'1','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} >> >> >> >> when you change the survey/gat number to 1/10, the params change like so: >> {'sno':'1#10','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} >> >> >> >> for 1/1अ: >> {'sno':'1#1अ','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} >> >> >> >> I tried some wget and curl commands but no luck so far. Do let me know >> if you can make some headway. >> >> Also, it would be great to also learn how to extract on the list of >> districts, talukas (subdistricts) in each district, and villages in >> each taluka. >> >> dumping other info at bottom if it helps. >> >> Why do this: >> At present it's just an exploration following on from our work on >> village shapefiles. >> The district > taluka > village mapping data from official Land >> Records data could serve as a good source for triangulation. >> Then, while I don't see myself going deeper into this right now, I am >> aware that land records / ownership has major corruption, >> entanglements and other issues precisely because of the lack of >> transparency. The mahabhulekh website itself is a significant step >> forward in making this sector a little more transparent, and more push >> in this direction would probably do more good IMHO. At some point >> GIS/lat-long info might come in, and it would be good to bring the >> data to a level that is ready for it. >> >> >> Data dump: >> When we press the button to fetch the 7/12 (saatbarah) record, the >> console records a POST with these parameters: >> >> Copy as cURL: >> curl 'https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712' >> -H 'Host: mahabhulekh.maharashtra.gov.in' -H 'User-Agent: Mozilla/5.0 >> (X11; Ubuntu; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0' -H >> 'Accept: application/json, text/plain, */*' -H 'Accept-Language: >> en-US,en;q=0.5' --compressed -H 'Content-Type: >> application/json;charset=utf-8' -H 'Referer: >> https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx' -H >> 'Content-Length: 170' -H 'Cookie: >> ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc' -H 'Connection: >> keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache' >> >> Copy POST data: >> {'sno':'1#1अ','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'} >> >> >> >> request headers: >> POST /Konkan/Home.aspx/call712 HTTP/1.1 >> Host: mahabhulekh.maharashtra.gov.in >> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:42.0) >> Gecko/20100101 Firefox/42.0 >> Accept: application/json, text/plain, */* >> Accept-Language: en-US,en;q=0.5 >> Accept-Encoding: gzip, deflate >> Content-Type: application/json;charset=utf-8 >> Referer: https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx >> Content-Length >> <https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspxContent-Length>: >> 170 >> Cookie: ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc >> Connection: keep-alive >> Pragma: no-cache >> Cache-Control: no-cache >> >> response headers: >> HTTP/1.1 200 OK >> Cache-Control: private, max-age=0 >> Content-Type: application/json; charset=utf-8 >> Server: Microsoft-IIS/8.0 >> X-Powered-By: ASP.NET >> Date: Mon, 24 Oct 2016 15:31:40 GMT >> Content-Length: 10 >> >> Copy Response: >> {"d":null} >> >> >> -- >> -- >> Cheers, >> Nikhil >> +91-966-583-1250 >> Pune, India >> Self-designed learner at Swaraj University < >> http://www.swarajuniversity.org> >> Blog <http://nikhilsheth.blogspot.in> | Contribute >> <https://www.payumoney.com/webfronts/#/index/NikhilVJ> >> > > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
