Devdatta, I lined up a map (spent 5 minutes, might be able to do it a little better if I spend more time) for Orissa. I used a district shapefile from their government (or potentially the SOI), I believe it has a custom projection. They had it up a few weeks ago, then the website disappeared... I rectified, then used a unsupervised classification, then vectorized. I haven't gone in and cleaned up the data, but do you think this would be worthwhile developing? I would rather have software do all the work than actually tracing lines myself - Let me know your thoughts.
Data is here: https://app.box.com/s/lfeg76yxkcqpyixojorg Justin On Friday, July 18, 2014 12:16:31 AM UTC-4, Justin Meyers wrote: > > Devdatta, > Yikes! I was really hoping there was some dataset(s) out there that > actually made sense... Even the census tables from the > http://censusindia.gov.in/ have duplicates, and it isn't always clear > what record should be used. Some of the village level data I have seen > shows the wrong tehsil code in a central town (lets say the town code is > 33333333xxxxxxxx, all the surrounding villages have codes that are > 33444444xxxxxxxx). I have worked with some wild data in the past, but > India seems like a nightmare.What it will most likely come down to is that > it will make sense that it doesn't make sense... if that makes sense!?! > > I think I need to collect my thoughts with all this and re-calibrate. I > have a couple ideas, but I tried them and the results didn't make sense (so > maybe they are correct (makes sense that it doesn't make sense...!??)). > > I'll keep you posted. If you come up with anything, or additional > resources, please let me know. > > Cheers! > Justin > > On Thursday, July 17, 2014 11:45:29 PM UTC-4, Devdatta Tengshe wrote: >> >> Hi Justin, >> >> It's very hard to look at Survey of India Digital Data and preserve your >> sanity. As you have found out, the boundaries of different Administrative >> levels do not match. There are many reasons for this, and not all of them >> are solvable. >> >> The boundaries in the PDFs are generalised no doubt, but if one takes >> care while digitizing at the correct scale, one shouldn't have much >> problems. See the district shapefiles on the github repo. They were made >> from a top down procedure. I used the PC boundaries for the country and >> state boundaries. The individual district boundaries were made by referring >> to these very Census maps, as well as tehesil boundaries. I also used an >> custom tool which I have developed, which helps in cutting one polygon >> based on another polygon, which tremendously cut down the time I spent on >> creating these internal boundaries.So while the district boundaries might >> be generalised in some cases, that the best, updated shapefile I know of >> today. >> >> Having worked with government departments, I have learnt that getting >> data itself is a big task.Any data is a boon. And once I get the data, I >> don't expect it to match anything else. With this paradigm, the Census maps >> are a goldmine for me. >> >> Regards, >> Devdatta >> >> >> On Fri, Jul 18, 2014 at 8:58 AM, Justin Meyers <justinell...@gmail.com> >> wrote: >> >>> Devdatta, >>> Thanks for the quick response. I thought the files originated from the >>> Survey of India, but wasn't certain. I started to create a villages >>> dataset, but the tehsils do not really align with what the 2001 census >>> villages state their respected tehsil parent is... So I am assuming all of >>> the data from the gevernment is a mix bag (spelling may be off, codes may >>> be wrong/ outdated, data may be mixed between years). What a mess!?!?! As >>> per rectifying and creating maps based off the PDFs, I'm not sure I would >>> do that. The lines they have for boundaries are very, very generalized. >>> Also, I tried (a few years ago) to line them up with actual vector data, >>> and there is a huge shift (i was using WGS84 vector data, so maybe I should >>> have reprojected). >>> >>> Maybe it would be best to start top down or bottom up. So either build >>> a dataset from villages up to states or states down to villages. >>> >>> Thoughts? We need some official data though (which seems impossible to >>> find...). But anything is possible, right!?! >>> >>> Cheers, >>> Justin >>> >>> On Thursday, July 17, 2014 11:17:33 PM UTC-4, Devdatta Tengshe wrote: >>> >>>> Hi Justin, >>>> I know the euphoria that one has when one has done something new. It's >>>> one of the best things in the world. >>>> >>>> If the original source you mentioned is Bhuvan, then the files came >>>> directly from Survey of India. I have used those files before, and as you >>>> mentioned there were only some 2000 Odd features in it. >>>> >>>> There are not from any specific era. Some tehsils in the file were >>>> created post 2001, while others created in the 90's were not present. >>>> >>>> The only exhaustive source I know, is the Census Administrative Atlas. >>>> They have maps in PDF format, not in shapefiles, and I had used it to >>>> create the district shapefiles which are shared on the datameet github >>>> repos. >>>> Sometimes I feel I should get started on digitizing those pdfs. It >>>> shouldn't take more than 40 hours. >>>> >>>> Regards, >>>> Devdatta Tengshe >>>> >>>> >>>> On Fri, Jul 18, 2014 at 8:27 AM, Justin Meyers <justinell...@gmail.com> >>>> wrote: >>>> >>>>> Devdatta, >>>>> Sorry I didn't type that up. I just finished processing it and was >>>>> excited and posted. The previous file i posted had 2,693 features. This >>>>> file has 2,739 features. Initially I thought the data was relevant to >>>>> 2001, but maybe it is 1991 (I have no metadata, the Indian government >>>>> does >>>>> not respond to my e-mails (I have sent at least a dozen, but they do not >>>>> respond)). I am not certain of the exact source, it is hosted by the >>>>> Bhuvan (who do not respond to emails either....). >>>>> >>>>> As per any processing, I took the data and sorted the attributes (it >>>>> was a long string all attached as one - so i split it and created the >>>>> fields). >>>>> >>>>> Any other questions? If you know of a more current dataset please >>>>> post!! >>>>> >>>>> >>>>> On Thursday, July 17, 2014 9:19:41 PM UTC-4, Devdatta Tengshe wrote: >>>>> >>>>>> Hi Justin, >>>>>> >>>>>> Can you let us know what was the procedure to create this file, and >>>>>> this is accurate upto which date? >>>>>> I'm asking this shapefile has 2739 sub districts, and according to >>>>>> the census, there should be 5564. >>>>>> >>>>>> Regards, >>>>>> Devdatta Tengshe >>>>>> >>>>>> >>>>>> On Thu, Jul 17, 2014 at 11:15 PM, Justin Meyers < >>>>>> justinell...@gmail.com> wrote: >>>>>> >>>>>>> https://app.box.com/s/486rvabh3sjviiynbyu4 >>>>>>> >>>>>>> >>>>>>> Cheers! >>>>>>> >>>>>>> -- >>>>>>> Datameet is a community of Data Science enthusiasts in India. Know >>>>>>> more about us by visiting http://datameet.org >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "datameet" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to datameet+u...@googlegroups.com. >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> Datameet is a community of Data Science enthusiasts in India. Know >>>>> more about us by visiting http://datameet.org >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "datameet" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to datameet+u...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to datameet+u...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.