Hi Justin and others,

All the very best for this effort.

I want to share about our (Pune chapter's) experiences regarding
Maharashtra Villages data. This might also help resolve some questions
filed as issues our github repo.

To start with, we had quite some mismatch between the shapefile we got and
the census 2011 village data (which we are considering as benchmark as it
is a unique, well-ordered and government-ratified dataset that is published
openly, whereas the shapefile is sourced informally). And we went with 2011
because with all things considered, 2011 is closer to present reality than
2001.

After a few attempts by me at "fixing" things, we decided it was best not
to mess with the shapes, as we were crossing many point-of-no-returns in
the process and the output was looking more like Swiss cheese than a map.
Rather, we could just add new columns/attributes to indicate the
recommended changes. The agenda shifted to not necessarily fix and make a
proper map, but to document what was wrong and where the corrections are
needed, and hopefully someday send the feedback upstream and get the
appropriate government agency (MRSAC in this case) to fix it. Or at least
if we do publish a "fixed" version at some point, we need to have full
documentation of exactly what changes have been done so that there is
traceability.

And so we set up *this tracking sheet
<https://docs.google.com/spreadsheets/d/1vryZTdPOXEblEac_zZw_erSOTYqy5qY-rG_V1bRCA8Q/edit?usp=sharing>*
.
It's a bit chaotic, there are several worksheets there. I won't be able to
explain further here and it is a suspended work-in-progress.. . we have set
some things up but left the tasks pending for potential volunteers and
interns to take up. For those who want to know more about this, please
reply with a different subject line or please post your queries on the
#pune channel in datameet.slack.com .

We did have one immediate requirement of producing a shapefile for *a web
interface that Namita was developing*
<https://bnamita.github.io/Village_Mapping_v2/> that dynamically combines
shapefile and census csv data. We needed the shapefile to have a census
code column/attribute that is non-null and unique, to act as the primary
key to match with census data. So, I created a version with a new column
added where the repeating and null codes were suffixed with serial numbers.
(that would render them unmatchable with the census data, but at least they
would not interfere with the program). I have *documented the process here
<https://docs.google.com/document/d/e/2PACX-1vQO_bAKdtsoC61POlkmRUp32p1NfdxXtNqZ4Rk2gcEJdphPBtyiwKxVSzuFVnZSlN2ShEBcnQffdSL8/pub>*
.

So my suggestion for the all-states effort, would be to document all the
changes, fixes done and keep ways of tracing back. (adding new columns and
making changes there is one such way). It's much tougher, but it will help
set up a situation where the fixes can be integrated back into official
sources rather than being isolated forks. We can be skeptical of that ever
happening, but the alternative is to have to repeat the entire exercise
post 2021 census or any time the govt agencies republish the shapes. In any
case keeping documentation of changes will speed up the next round as
mistakes we find now will likely be repeated later.

PS: By "we" I am referring to myself, Craig, Devdatta, Namita, Riddhi,
Jinda and some more people from Pune who contributed to the process
(apologies for missing names!). We did a few meetups and then made a
smaller focus group.


--
Cheers,
Nikhil VJ
+91-966-583-1250
Pune, India
Website <http://nikhilvj.co.in>
DataMeet Pune chapter <https://datameet-pune.github.io/>
Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
Contribute <https://www.instamojo.com/@nikhilvj/>

On Fri, Mar 23, 2018 at 9:13 AM, Justin <justinelliotmey...@gmail.com>
wrote:

> okay; started to upload the data:
> https://github.com/justinelliotmeyers/INDIA_2018_SHAPEFILE_BOUNDARIES
>
> please give any feedback, poit out errors, what I did wrong, best way to
> move forward, etc. This is v1, so I expect issues to exist, and missing
> locations, bad spellings, code issues, etc.
>
> I tried to clean it up as much as possible - in order to get this pushed
> out before the weekend I dissolved to a grid to keep geometry size down. We
> can review geometry after initial review of attributes.
>
> I just threw everything in a repo; we can upload to a more formal project
> after this first stae, just wanted to start.
>
> Thanks!!!!!
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to