Thank you @srinivas  @vaishnavi for your feedback.

I have also uploaded a .tsv file :
https://raw.githubusercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.tsv
<https://raw.githubusercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.tsv>


I totally like the idea of crowdsourcing. How do you want to proceed ?

1. Issue a PR with changes in the TSV or open it as an issue ?
2. If there is a source which needs to be scraped then open it as an Issue ?
3. Use this repo as the main source or move it somewhere more open maybe
datameet repo ?

If you know of any other resources let me know, will pull them in.

-Konark
@konarkmodi

On Wed, May 17, 2017 at 5:26 AM, Vaishnavi Jayakumar (Inclusive India) <
vaishnavi.jayaku...@inclusiveindia.info> wrote:

> Yes please to the crowdsourcing!
>
> Mammoth task - this itself is 10741. (And more popping up all the time. )
>
> Old one that's missing for eg =  araiindia.com
> New one that's not been updated = sci.gov.in
>
> When o when are they going to be updated to reflect the gov.in default?
> When o when will we stop seeing gmail ids for government work by govt
> officials?
>
> ---------------------------------------
> *VAISHNAVI JAYAKUMAR*
> http://about.me/vjayakumar
>
> On Wed, May 17, 2017 at 7:35 AM, srinivas kodali <iota.kod...@gmail.com>
> wrote:
>
>> Unfortunately this is not all the websites. There are more which are not
>> part of directory. We should probably start crowdsourcing the others.
>>
>> Regards,
>> Srinivas Kodali
>> www.lostprogrammer.com
>>
>> On Wed, May 17, 2017 at 1:40 AM, konark modi <modi.kon...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I am always looking for a comprehensive list of GOI websites in a
>>> consumable manner for various projects. Hence I decided to scrape
>>> http://goidirectory.nic.in/index.php. (YES! There is not HTTPS for this
>>> link).
>>>
>>> I have dumped a list of websites: https://raw.githubus
>>> ercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.json
>>>
>>> *Number of Websites:* 10741
>>> Suffix Count
>>> .gov.in 4805
>>> .nic.in 2766
>>> .org 855
>>> .com 566
>>> .ac.in 499
>>> .in 485
>>> .co.in 209
>>> .org.in 176
>>> .res.in 158
>>> .edu.in 110
>>> .net 37
>>> .edu 26
>>> .net_in 9
>>> .info 7
>>> .aero 2
>>> .gen_in 1
>>> .coop 1
>>>
>>>
>>> Hope this list is useful for quite some projects / studies.
>>>
>>> Please feel free to add missing domains, or other information which
>>> would be relevant, the working repo is: https://github.com/konarkm
>>> odi/DigitalIndia
>>>
>>>
>>> -Konark
>>> @konarkmodi
>>>
>>> --
>>> Datameet is a community of Data Science enthusiasts in India. Know more
>>> about us by visiting http://datameet.org
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "datameet" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to datameet+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> Datameet is a community of Data Science enthusiasts in India. Know more
>> about us by visiting http://datameet.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to datameet+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to