Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-03-10 Thread Pratap Vardhan
I've updated the data files today, it has about 7,83,014 facilities. You 
can download individual state files 
from https://github.com/pratapvardhan/rural-facilities-pmgsy



On Tuesday, March 2, 2021 at 11:47:43 AM UTC+5:30 nikh...@gmail.com wrote:

> Hi Harsh,
>
> The PMGSY site is dizzyingly full of data! Kudos and gratitude to all the 
> people who have been working on it and the govt / elected officials who 
> supported its release to the public. Sets a great benchmark / precedent.
>
> Even apart from the data itself, the hierarchy in the dropdown selects is 
> valuable too as people can use that for mapping so many other things in 
> other fields.
>
> I'm not able to see geo-tagging in the sections I'm checking out. Can you 
> guide pls? 
>
> Suggestion : Make short screen recording videos on youtube showing how to 
> use the site. There's a lot of free tools and sites for it, but if zoom is 
> already there then one can start a call with recording on and screen-share 
> and do the job.
>
> --
> Cheers,
> Nikhil VJ
> https://nikhilvj.co.in
>
>
> On Mon, Mar 1, 2021 at 9:03 PM Arun Ganesh  wrote:
>
>>
>>>
>>> Does anyone know how to get this data ported to OSM (if at all that's 
>>> possibility)?
>>>
>>>
>> Importing data to OSM is possible, but since it will have to be conflated 
>> with any existing data, it will require quite a bit of data 
>> preparation with many volunteers. An example of a recent import was the 
>> data.gov.in health facilities dataset 
>> https://wiki.openstreetmap.org/wiki/India_Health_Facilities_Import . An 
>> overview of the import process is here: 
>> https://wiki.openstreetmap.org/wiki/Import/Guidelines . If the data 
>> quality is not consistent and requires manual cleanup, going for an import 
>> might be a lot of effort.
>>
>> That said, the PMGSY data is quite valuable and can add a lot of missing 
>> info into OSM for rural areas. It would make sense to start a conversation 
>> with the OSM community on ideas and how to take this forward. A good way to 
>> begin is by sending an intro email to the mailing list 
>> https://lists.openstreetmap.org/listinfo/talk-in and starting a 
>> conversation on it on the telegram group https://t.me/OSMIndia . There 
>> are quite a few folks experienced with OSM imports who can help out.
>>
>> -- 
>> Datameet is a community of Data Science enthusiasts in India. Know more 
>> about us by visiting http://datameet.org
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to datameet+u...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/datameet/CA%2BGKQr2BRKc619HFpEMffimJHuEgrdSnRJUg6ekW%3Dup368P5Bw%40mail.gmail.com
>>  
>> 
>> .
>>
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/0cb7b99d-0eb0-4ba2-9953-8a22a29e7311n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-03-01 Thread Nikhil VJ
Hi Harsh,

The PMGSY site is dizzyingly full of data! Kudos and gratitude to all the
people who have been working on it and the govt / elected officials who
supported its release to the public. Sets a great benchmark / precedent.

Even apart from the data itself, the hierarchy in the dropdown selects is
valuable too as people can use that for mapping so many other things in
other fields.

I'm not able to see geo-tagging in the sections I'm checking out. Can you
guide pls?

Suggestion : Make short screen recording videos on youtube showing how to
use the site. There's a lot of free tools and sites for it, but if zoom is
already there then one can start a call with recording on and screen-share
and do the job.

--
Cheers,
Nikhil VJ
https://nikhilvj.co.in


On Mon, Mar 1, 2021 at 9:03 PM Arun Ganesh  wrote:

>
>>
>> Does anyone know how to get this data ported to OSM (if at all that's
>> possibility)?
>>
>>
> Importing data to OSM is possible, but since it will have to be conflated
> with any existing data, it will require quite a bit of data
> preparation with many volunteers. An example of a recent import was the
> data.gov.in health facilities dataset
> https://wiki.openstreetmap.org/wiki/India_Health_Facilities_Import . An
> overview of the import process is here:
> https://wiki.openstreetmap.org/wiki/Import/Guidelines . If the data
> quality is not consistent and requires manual cleanup, going for an import
> might be a lot of effort.
>
> That said, the PMGSY data is quite valuable and can add a lot of missing
> info into OSM for rural areas. It would make sense to start a conversation
> with the OSM community on ideas and how to take this forward. A good way to
> begin is by sending an intro email to the mailing list
> https://lists.openstreetmap.org/listinfo/talk-in and starting a
> conversation on it on the telegram group https://t.me/OSMIndia . There
> are quite a few folks experienced with OSM imports who can help out.
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/CA%2BGKQr2BRKc619HFpEMffimJHuEgrdSnRJUg6ekW%3Dup368P5Bw%40mail.gmail.com
> 
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAH7jeuPO4Xu-PCapZ75a7FY_v%3D8W4-H9z7z%3DdSO0jfZcm2oWgw%40mail.gmail.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-03-01 Thread Arun Ganesh
>
>
>
> Does anyone know how to get this data ported to OSM (if at all that's
> possibility)?
>
>
Importing data to OSM is possible, but since it will have to be conflated
with any existing data, it will require quite a bit of data
preparation with many volunteers. An example of a recent import was the
data.gov.in health facilities dataset
https://wiki.openstreetmap.org/wiki/India_Health_Facilities_Import . An
overview of the import process is here:
https://wiki.openstreetmap.org/wiki/Import/Guidelines . If the data quality
is not consistent and requires manual cleanup, going for an import might be
a lot of effort.

That said, the PMGSY data is quite valuable and can add a lot of missing
info into OSM for rural areas. It would make sense to start a conversation
with the OSM community on ideas and how to take this forward. A good way to
begin is by sending an intro email to the mailing list
https://lists.openstreetmap.org/listinfo/talk-in and starting a
conversation on it on the telegram group https://t.me/OSMIndia . There are
quite a few folks experienced with OSM imports who can help out.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CA%2BGKQr2BRKc619HFpEMffimJHuEgrdSnRJUg6ekW%3Dup368P5Bw%40mail.gmail.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-03-01 Thread Harsh Nisar


On Wednesday, 20 January 2021 at 11:40:46 UTC+5:30 prkraj...@gmail.com 
wrote:

> Hello Harsh and everyone,
> This is great work! 
>
> Would it be possible to include these two pieces of info in the FAQs: 
> 1. Date/ Month of the opening of the dataset to the public 
> 2. The survey period over which this dataset was collected (I understand 
> the survey is ongoing for some states, so maybe at least the start month 
> would be useful to have).
> Thanks!
>

Yes - I'll get that updated. Thanks.

Does anyone know how to get this data ported to OSM (if at all that's 
possibility)?

 

>
> On Sat, Nov 21, 2020 at 10:49 AM nisar...@gmail.com  
> wrote:
>
>> Hello all,
>>
>> *ANN:* PMGSY has opened data for about 7,00,000 geo-tagged rural 
>> facilities across India. 
>>
>> The data was collected to help plan road investments in PMGSY-III. It was 
>> collected over the last year and counting. Depending on which state's data 
>> you download either the survey activity is completed or still under-process.
>>
>> The list of facilities which were to be surveyed as per guidelines of the 
>> scheme can be seen on Pg 37 of the PMGSY-III Guidelines (
>> https://pmgsy.nic.in/sites/default/files/PMGSY_III_guidelines.pdf)
>>
>> Eg. High Schools, Higher Secondary Schools, Vet Hospitals, PHCs, CHCs, 
>> Bedded Hospitals, Bus Stands, Block HQs, Panchayat HQs, Banks, Fuel 
>> Stations, Cold Storages, Agro Industries, Pack Houses, Collection Centres 
>> etc.
>>
>> Data opened includes name of facility, address, category, sub-category 
>> and lat/long.
>>
>> Some context:
>> While a common android application was used for this data collection 
>> there was no in-depth centralized training/SOP for how the data was to be 
>> collected and states were given freedom to interpret the definition of the 
>> facilities which need to be surveyed as long as they met the overarching 
>> categories and goals. Eg. Some states would have considered privately owned 
>> facilities as well for certain categories or would have interpreted 
>> bus-stands to include taxi-stands if that's the only relevant means of 
>> transport or not considered weekly haats for agro-markets etc. There is no 
>> documentation for these variations. Once the survey is completed in a 
>> Block it won't be updated in the future. 
>>
>> Even within a state you'll find variation because different divisions may 
>> have undertaken the survey independently with different levels of 
>> completeness, intent and accuracy. No standard mobile was used and GPS 
>> accuracy will vary from place to place. Further, the surveyors could be 
>> either on contract or government engineers. Treating it as a census may 
>> lead to claims of little substance. 
>>
>> Nevertheless, it was a massive exercise and hopefully of some secondary 
>> use as well.
>>
>> License is Open Data License - India (s/o Naveen Francis) and you can 
>> download data for one state at a time. Other disclaimers are on the 
>> website. 
>>
>> Link: http://omms.nic.in Other Reports -> Facility Details
>>
>> PS. Any pointers on how to collect citation metrics for this dataset are 
>> appreciated. It may help create a case for future such attempts to open 
>> data.
>>
>> Regards,
>> Harsh Nisar
>>
>> -- 
>>
> Datameet is a community of Data Science enthusiasts in India. Know more 
>> about us by visiting http://datameet.org
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to datameet+u...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/datameet/81e8a002-cb97-4c43-953d-7f9e4d9514fdn%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/e818297f-7316-47bb-8849-662a605c244an%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-01-19 Thread Rajesvari Parasa
Hello Harsh and everyone,
This is great work!

Would it be possible to include these two pieces of info in the FAQs:
1. Date/ Month of the opening of the dataset to the public
2. The survey period over which this dataset was collected (I understand
the survey is ongoing for some states, so maybe at least the start month
would be useful to have).
Thanks!

On Sat, Nov 21, 2020 at 10:49 AM nisar...@gmail.com 
wrote:

> Hello all,
>
> *ANN:* PMGSY has opened data for about 7,00,000 geo-tagged rural
> facilities across India.
>
> The data was collected to help plan road investments in PMGSY-III. It was
> collected over the last year and counting. Depending on which state's data
> you download either the survey activity is completed or still under-process.
>
> The list of facilities which were to be surveyed as per guidelines of the
> scheme can be seen on Pg 37 of the PMGSY-III Guidelines (
> https://pmgsy.nic.in/sites/default/files/PMGSY_III_guidelines.pdf)
>
> Eg. High Schools, Higher Secondary Schools, Vet Hospitals, PHCs, CHCs,
> Bedded Hospitals, Bus Stands, Block HQs, Panchayat HQs, Banks, Fuel
> Stations, Cold Storages, Agro Industries, Pack Houses, Collection Centres
> etc.
>
> Data opened includes name of facility, address, category, sub-category and
> lat/long.
>
> Some context:
> While a common android application was used for this data collection there
> was no in-depth centralized training/SOP for how the data was to be
> collected and states were given freedom to interpret the definition of the
> facilities which need to be surveyed as long as they met the overarching
> categories and goals. Eg. Some states would have considered privately owned
> facilities as well for certain categories or would have interpreted
> bus-stands to include taxi-stands if that's the only relevant means of
> transport or not considered weekly haats for agro-markets etc. There is no
> documentation for these variations. Once the survey is completed in a
> Block it won't be updated in the future.
>
> Even within a state you'll find variation because different divisions may
> have undertaken the survey independently with different levels of
> completeness, intent and accuracy. No standard mobile was used and GPS
> accuracy will vary from place to place. Further, the surveyors could be
> either on contract or government engineers. Treating it as a census may
> lead to claims of little substance.
>
> Nevertheless, it was a massive exercise and hopefully of some secondary
> use as well.
>
> License is Open Data License - India (s/o Naveen Francis) and you can
> download data for one state at a time. Other disclaimers are on the
> website.
>
> Link: http://omms.nic.in Other Reports -> Facility Details
>
> PS. Any pointers on how to collect citation metrics for this dataset are
> appreciated. It may help create a case for future such attempts to open
> data.
>
> Regards,
> Harsh Nisar
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/81e8a002-cb97-4c43-953d-7f9e4d9514fdn%40googlegroups.com
> 
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAEKTdxrC-oqu86WXCNqTimUAByyy8LtOVsouw3g_8PGkj8ScHA%40mail.gmail.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-01-11 Thread Arun Ganesh
On Mon, Jan 11, 2021 at 4:10 PM krit...@gmail.com 
wrote:

> Really useful to have this FAQ! Just a quick question is there a
> habitation to PC11/LGD village mapping anywhere?
> In case we want to match this at the village level?
>

A lookup of the PMGSY data block names to block LGD codes was crowdsourced
here: https://github.com/bkamapantula/pmgsy/issues/1#issuecomment-735031364

Going the next level to map habitation to village LGD code would be amazing.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CA%2BGKQr31o0DTx4Yf0YPyELnrVobguOtT7P6c3i6WXJboUPJf3A%40mail.gmail.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-01-11 Thread krit...@gmail.com
Really useful to have this FAQ! Just a quick question is there a habitation 
to PC11/LGD village mapping anywhere? 
In case we want to match this at the village level?

On Wednesday, 6 January 2021 at 14:59:00 UTC+5:30 Harsh Nisar wrote:

> Closing the loop: The team has uploaded FAQs on the main website and given 
> a static link for attribution & sharing.
>
> http://omms.nic.in/Home/PMGSYRuralDataset/ 
>
> On Saturday, 5 December 2020 at 06:36:33 UTC+5:30 krit...@gmail.com wrote:
>
>> Ah makes sense! Still, i look forward to the dataset FAQ getting 
>> uploaded. But thank you so much for the quick clarification. 
>>
>> Regards,
>> Kritarth
>> On Thursday, 3 December 2020 at 12:23:27 UTC+5:30 Harsh Nisar wrote:
>>
>>> Hi,
>>>
>>> The definitions were purposely kept lose as the primary focus of the 
>>> datasets was to aid *local *selection of roads for a government 
>>> programme. In such cases; the consistency of definitions within a 
>>> block/district/state was assumed more important than having it consistent 
>>> across geographies. The primary focus isn't a comparative census.
>>>
>>> So the definitions are as assumed by the JE/AE (frontline road 
>>> engineers) residing in the Block or in some states they borrowed from 
>>> census. But there isn't a documented consistency. But, the variance in high 
>>> schools will be minimal versus say agro industry.
>>>
>>> The generic list of facilities to be surveyed are in the guidelines 
>>> document Annexure 1 Pg 37.
>>>
>>> We are in process of getting a dataset FAQ uploaded on ommas.
>>>
>>> Regards,
>>> Harsh
>>>
>>> On Monday, 30 November 2020 at 17:54:01 UTC+5:30 krit...@gmail.com 
>>> wrote:
>>>
 Hi Harsh, All; 

 Thank you for pointing this dataset out and storing it in an easy to 
 access location. It looks super useful. 

 I was trying to find how exactly they've classified schools and 
 hospitals and couldn't find anything in this PMGSY documentation 
 .  
 For example the census 2011 documents what constitutes a high school and a 
 higher secondary school for example here 
 .
   
 I wonder if they've used the same definitions for schools and hospitals as 
 the census? Does anyone have any information on how they've chosen what 
 public facilities to document or is it pretty ad hoc as Harsh had 
 indicated 
 in the opening post on this thread?

 Regards,
 Kritarth Jha


 On Saturday, 28 November 2020 at 11:44:41 UTC+5:30 Arun Ganesh wrote:

> Btw, Bhanu and I have made some great progress matching the LGD codes 
> in the last hour. 100% of districts matched and it looks like the PMGSY 
> district names are definitely older (eg. Allahabad vs Prayagraj).
>
> Down to 1539 unmatched blocks which can be finished with some help. 
> Feel free to request access to above sheet if you would like to help out. 
> Instructions on column I & H. You basically need to copy and paste the 
> matching lookup key from the lgd block sheet to the pmgsy sheet.
>


-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/586f6ca5-74aa-49e5-9675-798cc2466578n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-01-06 Thread Harsh Nisar
 Closing the loop: The team has uploaded FAQs on the main website and given 
a static link for attribution & sharing.

http://omms.nic.in/Home/PMGSYRuralDataset/ 

On Saturday, 5 December 2020 at 06:36:33 UTC+5:30 krit...@gmail.com wrote:

> Ah makes sense! Still, i look forward to the dataset FAQ getting uploaded. 
> But thank you so much for the quick clarification. 
>
> Regards,
> Kritarth
> On Thursday, 3 December 2020 at 12:23:27 UTC+5:30 Harsh Nisar wrote:
>
>> Hi,
>>
>> The definitions were purposely kept lose as the primary focus of the 
>> datasets was to aid *local *selection of roads for a government 
>> programme. In such cases; the consistency of definitions within a 
>> block/district/state was assumed more important than having it consistent 
>> across geographies. The primary focus isn't a comparative census.
>>
>> So the definitions are as assumed by the JE/AE (frontline road engineers) 
>> residing in the Block or in some states they borrowed from census. But 
>> there isn't a documented consistency. But, the variance in high schools 
>> will be minimal versus say agro industry.
>>
>> The generic list of facilities to be surveyed are in the guidelines 
>> document Annexure 1 Pg 37.
>>
>> We are in process of getting a dataset FAQ uploaded on ommas.
>>
>> Regards,
>> Harsh
>>
>> On Monday, 30 November 2020 at 17:54:01 UTC+5:30 krit...@gmail.com wrote:
>>
>>> Hi Harsh, All; 
>>>
>>> Thank you for pointing this dataset out and storing it in an easy to 
>>> access location. It looks super useful. 
>>>
>>> I was trying to find how exactly they've classified schools and 
>>> hospitals and couldn't find anything in this PMGSY documentation 
>>> .  
>>> For example the census 2011 documents what constitutes a high school and a 
>>> higher secondary school for example here 
>>> .
>>>   
>>> I wonder if they've used the same definitions for schools and hospitals as 
>>> the census? Does anyone have any information on how they've chosen what 
>>> public facilities to document or is it pretty ad hoc as Harsh had indicated 
>>> in the opening post on this thread?
>>>
>>> Regards,
>>> Kritarth Jha
>>>
>>>
>>> On Saturday, 28 November 2020 at 11:44:41 UTC+5:30 Arun Ganesh wrote:
>>>
 Btw, Bhanu and I have made some great progress matching the LGD codes 
 in the last hour. 100% of districts matched and it looks like the PMGSY 
 district names are definitely older (eg. Allahabad vs Prayagraj).

 Down to 1539 unmatched blocks which can be finished with some help. 
 Feel free to request access to above sheet if you would like to help out. 
 Instructions on column I & H. You basically need to copy and paste the 
 matching lookup key from the lgd block sheet to the pmgsy sheet.

>>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/9445b654-d1b4-4e14-98cc-8c15fb9ac8b0n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2021-01-06 Thread Harsh Nisar
Closing the loop:
The team as uploaded FAQs on the main website and given a static link for 
attribution & sharing.

http://omms.nic.in/Home/PMGSYRuralDataset/

On Saturday, 5 December 2020 at 06:36:33 UTC+5:30 krit...@gmail.com wrote:

> Ah makes sense! Still, i look forward to the dataset FAQ getting uploaded. 
> But thank you so much for the quick clarification. 
>
> Regards,
> Kritarth
> On Thursday, 3 December 2020 at 12:23:27 UTC+5:30 Harsh Nisar wrote:
>
>> Hi,
>>
>> The definitions were purposely kept lose as the primary focus of the 
>> datasets was to aid *local *selection of roads for a government 
>> programme. In such cases; the consistency of definitions within a 
>> block/district/state was assumed more important than having it consistent 
>> across geographies. The primary focus isn't a comparative census.
>>
>> So the definitions are as assumed by the JE/AE (frontline road engineers) 
>> residing in the Block or in some states they borrowed from census. But 
>> there isn't a documented consistency. But, the variance in high schools 
>> will be minimal versus say agro industry.
>>
>> The generic list of facilities to be surveyed are in the guidelines 
>> document Annexure 1 Pg 37.
>>
>> We are in process of getting a dataset FAQ uploaded on ommas.
>>
>> Regards,
>> Harsh
>>
>> On Monday, 30 November 2020 at 17:54:01 UTC+5:30 krit...@gmail.com wrote:
>>
>>> Hi Harsh, All; 
>>>
>>> Thank you for pointing this dataset out and storing it in an easy to 
>>> access location. It looks super useful. 
>>>
>>> I was trying to find how exactly they've classified schools and 
>>> hospitals and couldn't find anything in this PMGSY documentation 
>>> .  
>>> For example the census 2011 documents what constitutes a high school and a 
>>> higher secondary school for example here 
>>> .
>>>   
>>> I wonder if they've used the same definitions for schools and hospitals as 
>>> the census? Does anyone have any information on how they've chosen what 
>>> public facilities to document or is it pretty ad hoc as Harsh had indicated 
>>> in the opening post on this thread?
>>>
>>> Regards,
>>> Kritarth Jha
>>>
>>>
>>> On Saturday, 28 November 2020 at 11:44:41 UTC+5:30 Arun Ganesh wrote:
>>>
 Btw, Bhanu and I have made some great progress matching the LGD codes 
 in the last hour. 100% of districts matched and it looks like the PMGSY 
 district names are definitely older (eg. Allahabad vs Prayagraj).

 Down to 1539 unmatched blocks which can be finished with some help. 
 Feel free to request access to above sheet if you would like to help out. 
 Instructions on column I & H. You basically need to copy and paste the 
 matching lookup key from the lgd block sheet to the pmgsy sheet.

>>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/01660665-f830-444e-824d-d49f4ee0aea8n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-12-04 Thread krit...@gmail.com
Ah makes sense! Still, i look forward to the dataset FAQ getting uploaded. 
But thank you so much for the quick clarification. 

Regards,
Kritarth
On Thursday, 3 December 2020 at 12:23:27 UTC+5:30 Harsh Nisar wrote:

> Hi,
>
> The definitions were purposely kept lose as the primary focus of the 
> datasets was to aid *local *selection of roads for a government 
> programme. In such cases; the consistency of definitions within a 
> block/district/state was assumed more important than having it consistent 
> across geographies. The primary focus isn't a comparative census.
>
> So the definitions are as assumed by the JE/AE (frontline road engineers) 
> residing in the Block or in some states they borrowed from census. But 
> there isn't a documented consistency. But, the variance in high schools 
> will be minimal versus say agro industry.
>
> The generic list of facilities to be surveyed are in the guidelines 
> document Annexure 1 Pg 37.
>
> We are in process of getting a dataset FAQ uploaded on ommas.
>
> Regards,
> Harsh
>
> On Monday, 30 November 2020 at 17:54:01 UTC+5:30 krit...@gmail.com wrote:
>
>> Hi Harsh, All; 
>>
>> Thank you for pointing this dataset out and storing it in an easy to 
>> access location. It looks super useful. 
>>
>> I was trying to find how exactly they've classified schools and hospitals 
>> and couldn't find anything in this PMGSY documentation 
>> .  
>> For example the census 2011 documents what constitutes a high school and a 
>> higher secondary school for example here 
>> .
>>   
>> I wonder if they've used the same definitions for schools and hospitals as 
>> the census? Does anyone have any information on how they've chosen what 
>> public facilities to document or is it pretty ad hoc as Harsh had indicated 
>> in the opening post on this thread?
>>
>> Regards,
>> Kritarth Jha
>>
>>
>> On Saturday, 28 November 2020 at 11:44:41 UTC+5:30 Arun Ganesh wrote:
>>
>>> Btw, Bhanu and I have made some great progress matching the LGD codes in 
>>> the last hour. 100% of districts matched and it looks like the PMGSY 
>>> district names are definitely older (eg. Allahabad vs Prayagraj).
>>>
>>> Down to 1539 unmatched blocks which can be finished with some help. Feel 
>>> free to request access to above sheet if you would like to help out. 
>>> Instructions on column I & H. You basically need to copy and paste the 
>>> matching lookup key from the lgd block sheet to the pmgsy sheet.
>>>
>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/7419fc21-5abf-4fc4-83f2-6a56fd0918b5n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-12-02 Thread Harsh Nisar
Hi,

The definitions were purposely kept lose as the primary focus of the 
datasets was to aid *local *selection of roads for a government programme. 
In such cases; the consistency of definitions within a block/district/state 
was assumed more important than having it consistent across geographies. 
The primary focus isn't a comparative census.

So the definitions are as assumed by the JE/AE (frontline road engineers) 
residing in the Block or in some states they borrowed from census. But 
there isn't a documented consistency. But, the variance in high schools 
will be minimal versus say agro industry.

The generic list of facilities to be surveyed are in the guidelines 
document Annexure 1 Pg 37.

We are in process of getting a dataset FAQ uploaded on ommas.

Regards,
Harsh

On Monday, 30 November 2020 at 17:54:01 UTC+5:30 krit...@gmail.com wrote:

> Hi Harsh, All; 
>
> Thank you for pointing this dataset out and storing it in an easy to 
> access location. It looks super useful. 
>
> I was trying to find how exactly they've classified schools and hospitals 
> and couldn't find anything in this PMGSY documentation 
> .  For 
> example the census 2011 documents what constitutes a high school and a 
> higher secondary school for example here 
> .
>   
> I wonder if they've used the same definitions for schools and hospitals as 
> the census? Does anyone have any information on how they've chosen what 
> public facilities to document or is it pretty ad hoc as Harsh had indicated 
> in the opening post on this thread?
>
> Regards,
> Kritarth Jha
>
>
> On Saturday, 28 November 2020 at 11:44:41 UTC+5:30 Arun Ganesh wrote:
>
>> Btw, Bhanu and I have made some great progress matching the LGD codes in 
>> the last hour. 100% of districts matched and it looks like the PMGSY 
>> district names are definitely older (eg. Allahabad vs Prayagraj).
>>
>> Down to 1539 unmatched blocks which can be finished with some help. Feel 
>> free to request access to above sheet if you would like to help out. 
>> Instructions on column I & H. You basically need to copy and paste the 
>> matching lookup key from the lgd block sheet to the pmgsy sheet.
>>
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/f53ec92a-43f3-4165-9c56-dcbcae0e9b65n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-30 Thread Digvijay Bendrikar Shinde
This is Amazing!

Thank you very much!!

On Sat, Nov 21, 2020 at 10:49 AM nisar...@gmail.com 
wrote:

> Hello all,
>
> *ANN:* PMGSY has opened data for about 7,00,000 geo-tagged rural
> facilities across India.
>
> The data was collected to help plan road investments in PMGSY-III. It was
> collected over the last year and counting. Depending on which state's data
> you download either the survey activity is completed or still under-process.
>
> The list of facilities which were to be surveyed as per guidelines of the
> scheme can be seen on Pg 37 of the PMGSY-III Guidelines (
> https://pmgsy.nic.in/sites/default/files/PMGSY_III_guidelines.pdf)
>
> Eg. High Schools, Higher Secondary Schools, Vet Hospitals, PHCs, CHCs,
> Bedded Hospitals, Bus Stands, Block HQs, Panchayat HQs, Banks, Fuel
> Stations, Cold Storages, Agro Industries, Pack Houses, Collection Centres
> etc.
>
> Data opened includes name of facility, address, category, sub-category and
> lat/long.
>
> Some context:
> While a common android application was used for this data collection there
> was no in-depth centralized training/SOP for how the data was to be
> collected and states were given freedom to interpret the definition of the
> facilities which need to be surveyed as long as they met the overarching
> categories and goals. Eg. Some states would have considered privately owned
> facilities as well for certain categories or would have interpreted
> bus-stands to include taxi-stands if that's the only relevant means of
> transport or not considered weekly haats for agro-markets etc. There is no
> documentation for these variations. Once the survey is completed in a
> Block it won't be updated in the future.
>
> Even within a state you'll find variation because different divisions may
> have undertaken the survey independently with different levels of
> completeness, intent and accuracy. No standard mobile was used and GPS
> accuracy will vary from place to place. Further, the surveyors could be
> either on contract or government engineers. Treating it as a census may
> lead to claims of little substance.
>
> Nevertheless, it was a massive exercise and hopefully of some secondary
> use as well.
>
> License is Open Data License - India (s/o Naveen Francis) and you can
> download data for one state at a time. Other disclaimers are on the
> website.
>
> Link: http://omms.nic.in Other Reports -> Facility Details
>
> PS. Any pointers on how to collect citation metrics for this dataset are
> appreciated. It may help create a case for future such attempts to open
> data.
>
> Regards,
> Harsh Nisar
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/81e8a002-cb97-4c43-953d-7f9e4d9514fdn%40googlegroups.com
> 
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CA%2BsGXOLHg%3DTceZU01RyGTbG3MH5DTHwoDYbR1myR-yNbZxzc3A%40mail.gmail.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-30 Thread krit...@gmail.com
Hi Harsh, All; 

Thank you for pointing this dataset out and storing it in an easy to access 
location. It looks super useful. 

I was trying to find how exactly they've classified schools and hospitals 
and couldn't find anything in this PMGSY documentation 
.  For 
example the census 2011 documents what constitutes a high school and a 
higher secondary school for example here 
.
  
I wonder if they've used the same definitions for schools and hospitals as 
the census? Does anyone have any information on how they've chosen what 
public facilities to document or is it pretty ad hoc as Harsh had indicated 
in the opening post on this thread?

Regards,
Kritarth Jha


On Saturday, 28 November 2020 at 11:44:41 UTC+5:30 Arun Ganesh wrote:

> Btw, Bhanu and I have made some great progress matching the LGD codes in 
> the last hour. 100% of districts matched and it looks like the PMGSY 
> district names are definitely older (eg. Allahabad vs Prayagraj).
>
> Down to 1539 unmatched blocks which can be finished with some help. Feel 
> free to request access to above sheet if you would like to help out. 
> Instructions on column I & H. You basically need to copy and paste the 
> matching lookup key from the lgd block sheet to the pmgsy sheet.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/aee7b7ca-5adc-4b44-90de-32d2a90502f2n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-27 Thread Arun Ganesh
Btw, Bhanu and I have made some great progress matching the LGD codes in
the last hour. 100% of districts matched and it looks like the PMGSY
district names are definitely older (eg. Allahabad vs Prayagraj).

Down to 1539 unmatched blocks which can be finished with some help. Feel
free to request access to above sheet if you would like to help out.
Instructions on column I & H. You basically need to copy and paste the
matching lookup key from the lgd block sheet to the pmgsy sheet.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CA%2BGKQr0mETr9WvEZUMDGfrh9JTg2VV4uvZpW20yYoMjDyLOdnA%40mail.gmail.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-27 Thread Pratap Vardhan
I've added a note on top with source, license and last updated. Is there an 
recommended citation you'd like me to place? (also, feel free to submit a 
PR of what you think might keep the reading more informed.)
https://github.com/pratapvardhan/rural-facilities-pmgsy#rural-facilities-geo-tagged-dataset-pmgsy

On Friday, November 27, 2020 at 4:01:15 PM UTC-5 nisar...@gmail.com wrote:

> Rows which don't have lat-long will get updated as the work progresses in 
> the concerned states. The lat-long which are out of extent will not be 
> corrected and remain as-is because of errors in the app or simply low GPS 
> accuracy. They'll be very few. Once the survey work is complete and 
> finalized in all states; it will be mostly usable as-is from omms.nic.in. 
> It's snowing right now so survey work is on a hold.
>
> As the department released the data already in a machine readable format, 
> doesn't require scraping etc and under Open Data License, please try to 
> attribute the original source, link and license in your work and repo (4a/5 
> of the GODL).
>
> I have two concerns; primary is to ensure the data collection mechanism 
> and assumptions are documented officially and available readily (not just 
> for people subscribed to this google group) so that inferences made by 
> people on the data are grounded and more useful. The second is to find a 
> mechanism for people to cite the original source so that a case can be made 
> in the future for releasing other such datasets within the government. 
>
> On my side, it seems the best way to do this is to dedicate a static page 
> with FAQs on ommas & further release this data on the data.gov.in. Though 
> in true spirit, anyone can host it anywhere subject to Section 4 and 7 of 
> GODL.
>
>
> On Friday, 27 November 2020 at 20:01:35 UTC+5:30 prat...@gmail.com wrote:
>
>> Images seem to have been lost. Attaching them.
>> Distribution for states.  
>>
>> [image: pm1.PNG]I
>> 22 rows which probably have sub-category mislabeled? 
>> [image: pm2.PNG]
>>
>> On Friday, November 27, 2020 at 9:27:56 AM UTC-5 Pratap Vardhan wrote:
>>
>>> Thanks Nisar, that's useful to know. I'll update the repo with these 
>>> pointers. 
>>> What frequency for updates would you suggest (monthly or)? If you 
>>> prefer, we can move the repo to datameet org or any other widely accessible 
>>> one and collectively edit it.  
>>> So, what I meant about coordinates data is, some rows are blank and some 
>>> have coordinates but beyond India's extent bounds. I guess they will get 
>>> fixed with updates too.
>>> Here's the distribution for states.
>>>
>>> Separately, minor issue perhaps - there are 22 rows which probably have 
>>> sub-category mislabeled.
>>>  
>>> Thanks for the details!
>>> On Friday, November 27, 2020 at 12:47:59 AM UTC-5 nisar...@gmail.com 
>>> wrote:
>>>
 Really beautiful!  

 I'll answer some of the queries you raised on the tweet thread here for 
 everyone. 

 You've commented that the data in UK, HP & Nagaland appears erroneous. 
 I am assuming because the lat/long is missing. It is so because these 
 states are still doing the survey and haven't completed. They may complete 
 it in the next few months. Many of the other NE states are also in a 
 similar position. Goa and most UTs haven't been onboarded to the scheme 
 yet. For the same reason, I would point people towards to original dataset 
 or put in a system to update your GH repos regularly.

 Apart from that - I'll re-iterate that the data was collected by 
 government rural engineers at the block level. Intention, accuracy and 
 even 
 understanding of definitions will vary across blocks/districts and 
 especially states. The data serves its primary purpose with these 
 assumptions but may lead to misleading statistics if treated as a census 
 for cross-geography comparisons. 


 On Friday, 27 November 2020 at 10:50:23 UTC+5:30 prat...@gmail.com 
 wrote:

> I've pulled states csvs to this repo 
> https://github.com/pratapvardhan/rural-facilities-pmgsy. Consolidated 
> India csv is at 
> https://www.kaggle.com/pratapvardhan/770k-geotagged-rural-facilities-in-india-pmgsy
> And, posted a thread of couple of visuals and minor data issues here 
> https://twitter.com/PratapVardhan/status/1332174593877020673
> Would love to hear if you use this data to create something.
>
> On Monday, November 23, 2020 at 2:05:47 AM UTC-5 Arun Ganesh wrote:
>
>> Wonderful Harsh, its amazing to see such rural spatial datasets 
>> opened by the government. The OpenStreetMap India community is looking 
>> into 
>> the data to better get a sense of the quality to see how it could be 
>> integrated with the existing OSM basemap.
>>
>> As a sample have made an interactive map of just the Kerala data if 
>> anyone wants to explore: 
>> 

Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-27 Thread nisar...@gmail.com
Rows which don't have lat-long will get updated as the work progresses in 
the concerned states. The lat-long which are out of extent will not be 
corrected and remain as-is because of errors in the app or simply low GPS 
accuracy. They'll be very few. Once the survey work is complete and 
finalized in all states; it will be mostly usable as-is from omms.nic.in. 
It's snowing right now so survey work is on a hold.

As the department released the data already in a machine readable format, 
doesn't require scraping etc and under Open Data License, please try to 
attribute the original source, link and license in your work and repo (4a/5 
of the GODL).

I have two concerns; primary is to ensure the data collection mechanism and 
assumptions are documented officially and available readily (not just for 
people subscribed to this google group) so that inferences made by people 
on the data are grounded and more useful. The second is to find a mechanism 
for people to cite the original source so that a case can be made in the 
future for releasing other such datasets within the government. 

On my side, it seems the best way to do this is to dedicate a static page 
with FAQs on ommas & further release this data on the data.gov.in. Though 
in true spirit, anyone can host it anywhere subject to Section 4 and 7 of 
GODL.


On Friday, 27 November 2020 at 20:01:35 UTC+5:30 prat...@gmail.com wrote:

> Images seem to have been lost. Attaching them.
> Distribution for states.  
>
> [image: pm1.PNG]I
> 22 rows which probably have sub-category mislabeled? 
> [image: pm2.PNG]
>
> On Friday, November 27, 2020 at 9:27:56 AM UTC-5 Pratap Vardhan wrote:
>
>> Thanks Nisar, that's useful to know. I'll update the repo with these 
>> pointers. 
>> What frequency for updates would you suggest (monthly or)? If you prefer, 
>> we can move the repo to datameet org or any other widely accessible one and 
>> collectively edit it.  
>> So, what I meant about coordinates data is, some rows are blank and some 
>> have coordinates but beyond India's extent bounds. I guess they will get 
>> fixed with updates too.
>> Here's the distribution for states.
>>
>> Separately, minor issue perhaps - there are 22 rows which probably have 
>> sub-category mislabeled.
>>  
>> Thanks for the details!
>> On Friday, November 27, 2020 at 12:47:59 AM UTC-5 nisar...@gmail.com 
>> wrote:
>>
>>> Really beautiful!  
>>>
>>> I'll answer some of the queries you raised on the tweet thread here for 
>>> everyone. 
>>>
>>> You've commented that the data in UK, HP & Nagaland appears erroneous. I 
>>> am assuming because the lat/long is missing. It is so because these states 
>>> are still doing the survey and haven't completed. They may complete it in 
>>> the next few months. Many of the other NE states are also in a similar 
>>> position. Goa and most UTs haven't been onboarded to the scheme yet. For 
>>> the same reason, I would point people towards to original dataset or put in 
>>> a system to update your GH repos regularly.
>>>
>>> Apart from that - I'll re-iterate that the data was collected by 
>>> government rural engineers at the block level. Intention, accuracy and even 
>>> understanding of definitions will vary across blocks/districts and 
>>> especially states. The data serves its primary purpose with these 
>>> assumptions but may lead to misleading statistics if treated as a census 
>>> for cross-geography comparisons. 
>>>
>>>
>>> On Friday, 27 November 2020 at 10:50:23 UTC+5:30 prat...@gmail.com 
>>> wrote:
>>>
 I've pulled states csvs to this repo 
 https://github.com/pratapvardhan/rural-facilities-pmgsy. Consolidated 
 India csv is at 
 https://www.kaggle.com/pratapvardhan/770k-geotagged-rural-facilities-in-india-pmgsy
 And, posted a thread of couple of visuals and minor data issues here 
 https://twitter.com/PratapVardhan/status/1332174593877020673
 Would love to hear if you use this data to create something.

 On Monday, November 23, 2020 at 2:05:47 AM UTC-5 Arun Ganesh wrote:

> Wonderful Harsh, its amazing to see such rural spatial datasets opened 
> by the government. The OpenStreetMap India community is looking into the 
> data to better get a sense of the quality to see how it could be 
> integrated 
> with the existing OSM basemap.
>
> As a sample have made an interactive map of just the Kerala data if 
> anyone wants to explore: 
> https://api.mapbox.com/styles/v1/planemad/ckhrbz1o6038o19piti441hd7.html?fresh=true=copy_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#10.53/9.6726/76.524
>
> The dataset points are in yellow, the rest are from OSM. Some initial 
> evaluations in Kerala and Karnataka suggest that the data is pretty good 
> and spatially accurate within 100m.
>


-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this 

Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-27 Thread Pratap Vardhan
Images seem to have been lost. Attaching them.
Distribution for states.  

[image: pm1.PNG]I
22 rows which probably have sub-category mislabeled? 
[image: pm2.PNG]

On Friday, November 27, 2020 at 9:27:56 AM UTC-5 Pratap Vardhan wrote:

> Thanks Nisar, that's useful to know. I'll update the repo with these 
> pointers. 
> What frequency for updates would you suggest (monthly or)? If you prefer, 
> we can move the repo to datameet org or any other widely accessible one and 
> collectively edit it.  
> So, what I meant about coordinates data is, some rows are blank and some 
> have coordinates but beyond India's extent bounds. I guess they will get 
> fixed with updates too.
> Here's the distribution for states.
>
> Separately, minor issue perhaps - there are 22 rows which probably have 
> sub-category mislabeled.
>  
> Thanks for the details!
> On Friday, November 27, 2020 at 12:47:59 AM UTC-5 nisar...@gmail.com 
> wrote:
>
>> Really beautiful!  
>>
>> I'll answer some of the queries you raised on the tweet thread here for 
>> everyone. 
>>
>> You've commented that the data in UK, HP & Nagaland appears erroneous. I 
>> am assuming because the lat/long is missing. It is so because these states 
>> are still doing the survey and haven't completed. They may complete it in 
>> the next few months. Many of the other NE states are also in a similar 
>> position. Goa and most UTs haven't been onboarded to the scheme yet. For 
>> the same reason, I would point people towards to original dataset or put in 
>> a system to update your GH repos regularly.
>>
>> Apart from that - I'll re-iterate that the data was collected by 
>> government rural engineers at the block level. Intention, accuracy and even 
>> understanding of definitions will vary across blocks/districts and 
>> especially states. The data serves its primary purpose with these 
>> assumptions but may lead to misleading statistics if treated as a census 
>> for cross-geography comparisons. 
>>
>>
>> On Friday, 27 November 2020 at 10:50:23 UTC+5:30 prat...@gmail.com wrote:
>>
>>> I've pulled states csvs to this repo 
>>> https://github.com/pratapvardhan/rural-facilities-pmgsy. Consolidated 
>>> India csv is at 
>>> https://www.kaggle.com/pratapvardhan/770k-geotagged-rural-facilities-in-india-pmgsy
>>> And, posted a thread of couple of visuals and minor data issues here 
>>> https://twitter.com/PratapVardhan/status/1332174593877020673
>>> Would love to hear if you use this data to create something.
>>>
>>> On Monday, November 23, 2020 at 2:05:47 AM UTC-5 Arun Ganesh wrote:
>>>
 Wonderful Harsh, its amazing to see such rural spatial datasets opened 
 by the government. The OpenStreetMap India community is looking into the 
 data to better get a sense of the quality to see how it could be 
 integrated 
 with the existing OSM basemap.

 As a sample have made an interactive map of just the Kerala data if 
 anyone wants to explore: 
 https://api.mapbox.com/styles/v1/planemad/ckhrbz1o6038o19piti441hd7.html?fresh=true=copy_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#10.53/9.6726/76.524

 The dataset points are in yellow, the rest are from OSM. Some initial 
 evaluations in Kerala and Karnataka suggest that the data is pretty good 
 and spatially accurate within 100m.

>>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/c71a2201-b94d-4107-97c8-0ebd9baa76d0n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-27 Thread Pratap Vardhan
Thanks Nisar, that's useful to know. I'll update the repo with these 
pointers. 
What frequency for updates would you suggest (monthly or)? If you prefer, 
we can move the repo to datameet org or any other widely accessible one and 
collectively edit it.  
So, what I meant about coordinates data is, some rows are blank and some 
have coordinates but beyond India's extent bounds. I guess they will get 
fixed with updates too.
Here's the distribution for states.

Separately, minor issue perhaps - there are 22 rows which probably have 
sub-category mislabeled.
 
Thanks for the details!
On Friday, November 27, 2020 at 12:47:59 AM UTC-5 nisar...@gmail.com wrote:

> Really beautiful!  
>
> I'll answer some of the queries you raised on the tweet thread here for 
> everyone. 
>
> You've commented that the data in UK, HP & Nagaland appears erroneous. I 
> am assuming because the lat/long is missing. It is so because these states 
> are still doing the survey and haven't completed. They may complete it in 
> the next few months. Many of the other NE states are also in a similar 
> position. Goa and most UTs haven't been onboarded to the scheme yet. For 
> the same reason, I would point people towards to original dataset or put in 
> a system to update your GH repos regularly.
>
> Apart from that - I'll re-iterate that the data was collected by 
> government rural engineers at the block level. Intention, accuracy and even 
> understanding of definitions will vary across blocks/districts and 
> especially states. The data serves its primary purpose with these 
> assumptions but may lead to misleading statistics if treated as a census 
> for cross-geography comparisons. 
>
>
> On Friday, 27 November 2020 at 10:50:23 UTC+5:30 prat...@gmail.com wrote:
>
>> I've pulled states csvs to this repo 
>> https://github.com/pratapvardhan/rural-facilities-pmgsy. Consolidated 
>> India csv is at 
>> https://www.kaggle.com/pratapvardhan/770k-geotagged-rural-facilities-in-india-pmgsy
>> And, posted a thread of couple of visuals and minor data issues here 
>> https://twitter.com/PratapVardhan/status/1332174593877020673
>> Would love to hear if you use this data to create something.
>>
>> On Monday, November 23, 2020 at 2:05:47 AM UTC-5 Arun Ganesh wrote:
>>
>>> Wonderful Harsh, its amazing to see such rural spatial datasets opened 
>>> by the government. The OpenStreetMap India community is looking into the 
>>> data to better get a sense of the quality to see how it could be integrated 
>>> with the existing OSM basemap.
>>>
>>> As a sample have made an interactive map of just the Kerala data if 
>>> anyone wants to explore: 
>>> https://api.mapbox.com/styles/v1/planemad/ckhrbz1o6038o19piti441hd7.html?fresh=true=copy_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#10.53/9.6726/76.524
>>>
>>> The dataset points are in yellow, the rest are from OSM. Some initial 
>>> evaluations in Kerala and Karnataka suggest that the data is pretty good 
>>> and spatially accurate within 100m.
>>>
>>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/7286d673-1804-436d-8f6c-5805383db591n%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-26 Thread nisar...@gmail.com
Really beautiful!  

I'll answer some of the queries you raised on the tweet thread here for 
everyone. 

You've commented that the data in UK, HP & Nagaland appears erroneous. I am 
assuming because the lat/long is missing. It is so because these states are 
still doing the survey and haven't completed. They may complete it in the 
next few months. Many of the other NE states are also in a similar 
position. Goa and most UTs haven't been onboarded to the scheme yet. For 
the same reason, I would point people towards to original dataset or put in 
a system to update your GH repos regularly.

Apart from that - I'll re-iterate that the data was collected by government 
rural engineers at the block level. Intention, accuracy and even 
understanding of definitions will vary across blocks/districts and 
especially states. The data serves its primary purpose with these 
assumptions but may lead to misleading statistics if treated as a census 
for cross-geography comparisons. 


On Friday, 27 November 2020 at 10:50:23 UTC+5:30 prat...@gmail.com wrote:

> I've pulled states csvs to this repo 
> https://github.com/pratapvardhan/rural-facilities-pmgsy. Consolidated 
> India csv is at 
> https://www.kaggle.com/pratapvardhan/770k-geotagged-rural-facilities-in-india-pmgsy
> And, posted a thread of couple of visuals and minor data issues here 
> https://twitter.com/PratapVardhan/status/1332174593877020673
> Would love to hear if you use this data to create something.
>
> On Monday, November 23, 2020 at 2:05:47 AM UTC-5 Arun Ganesh wrote:
>
>> Wonderful Harsh, its amazing to see such rural spatial datasets opened by 
>> the government. The OpenStreetMap India community is looking into the data 
>> to better get a sense of the quality to see how it could be integrated with 
>> the existing OSM basemap.
>>
>> As a sample have made an interactive map of just the Kerala data if 
>> anyone wants to explore: 
>> https://api.mapbox.com/styles/v1/planemad/ckhrbz1o6038o19piti441hd7.html?fresh=true=copy_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#10.53/9.6726/76.524
>>
>> The dataset points are in yellow, the rest are from OSM. Some initial 
>> evaluations in Kerala and Karnataka suggest that the data is pretty good 
>> and spatially accurate within 100m.
>>
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/0389feff-0a16-4806-8315-81ca86febbebn%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-26 Thread Pratap Vardhan
I've pulled states csvs to this 
repo https://github.com/pratapvardhan/rural-facilities-pmgsy. Consolidated 
India csv is 
at 
https://www.kaggle.com/pratapvardhan/770k-geotagged-rural-facilities-in-india-pmgsy
And, posted a thread of couple of visuals and minor data issues here 
https://twitter.com/PratapVardhan/status/1332174593877020673
Would love to hear if you use this data to create something.

On Monday, November 23, 2020 at 2:05:47 AM UTC-5 Arun Ganesh wrote:

> Wonderful Harsh, its amazing to see such rural spatial datasets opened by 
> the government. The OpenStreetMap India community is looking into the data 
> to better get a sense of the quality to see how it could be integrated with 
> the existing OSM basemap.
>
> As a sample have made an interactive map of just the Kerala data if anyone 
> wants to explore: 
> https://api.mapbox.com/styles/v1/planemad/ckhrbz1o6038o19piti441hd7.html?fresh=true=copy_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#10.53/9.6726/76.524
>
> The dataset points are in yellow, the rest are from OSM. Some initial 
> evaluations in Kerala and Karnataka suggest that the data is pretty good 
> and spatially accurate within 100m.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/6659c889-5b63-4cd0-9262-2e28745f789fn%40googlegroups.com.


Re: [datameet] ANN: Opening of 7,00,000+ Rural Points of Interests Data

2020-11-22 Thread Arun Ganesh
Wonderful Harsh, its amazing to see such rural spatial datasets opened by
the government. The OpenStreetMap India community is looking into the data
to better get a sense of the quality to see how it could be integrated with
the existing OSM basemap.

As a sample have made an interactive map of just the Kerala data if anyone
wants to explore:
https://api.mapbox.com/styles/v1/planemad/ckhrbz1o6038o19piti441hd7.html?fresh=true=copy_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#10.53/9.6726/76.524

The dataset points are in yellow, the rest are from OSM. Some initial
evaluations in Kerala and Karnataka suggest that the data is pretty good
and spatially accurate within 100m.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CA%2BGKQr2E-wQ-zqs%2Bta0HGx80u9pcJhGXs-p7Ou_%3DdRXFs%3Dtazg%40mail.gmail.com.