Thank you @srinivas @vaishnavi for your feedback. I have also uploaded a .tsv file : https://raw.githubusercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.tsv <https://raw.githubusercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.tsv>
I totally like the idea of crowdsourcing. How do you want to proceed ? 1. Issue a PR with changes in the TSV or open it as an issue ? 2. If there is a source which needs to be scraped then open it as an Issue ? 3. Use this repo as the main source or move it somewhere more open maybe datameet repo ? If you know of any other resources let me know, will pull them in. -Konark @konarkmodi On Wed, May 17, 2017 at 5:26 AM, Vaishnavi Jayakumar (Inclusive India) < vaishnavi.jayaku...@inclusiveindia.info> wrote: > Yes please to the crowdsourcing! > > Mammoth task - this itself is 10741. (And more popping up all the time. ) > > Old one that's missing for eg = araiindia.com > New one that's not been updated = sci.gov.in > > When o when are they going to be updated to reflect the gov.in default? > When o when will we stop seeing gmail ids for government work by govt > officials? > > --------------------------------------- > *VAISHNAVI JAYAKUMAR* > http://about.me/vjayakumar > > On Wed, May 17, 2017 at 7:35 AM, srinivas kodali <iota.kod...@gmail.com> > wrote: > >> Unfortunately this is not all the websites. There are more which are not >> part of directory. We should probably start crowdsourcing the others. >> >> Regards, >> Srinivas Kodali >> www.lostprogrammer.com >> >> On Wed, May 17, 2017 at 1:40 AM, konark modi <modi.kon...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> I am always looking for a comprehensive list of GOI websites in a >>> consumable manner for various projects. Hence I decided to scrape >>> http://goidirectory.nic.in/index.php. (YES! There is not HTTPS for this >>> link). >>> >>> I have dumped a list of websites: https://raw.githubus >>> ercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.json >>> >>> *Number of Websites:* 10741 >>> Suffix Count >>> .gov.in 4805 >>> .nic.in 2766 >>> .org 855 >>> .com 566 >>> .ac.in 499 >>> .in 485 >>> .co.in 209 >>> .org.in 176 >>> .res.in 158 >>> .edu.in 110 >>> .net 37 >>> .edu 26 >>> .net_in 9 >>> .info 7 >>> .aero 2 >>> .gen_in 1 >>> .coop 1 >>> >>> >>> Hope this list is useful for quite some projects / studies. >>> >>> Please feel free to add missing domains, or other information which >>> would be relevant, the working repo is: https://github.com/konarkm >>> odi/DigitalIndia >>> >>> >>> -Konark >>> @konarkmodi >>> >>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to datameet+unsubscr...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to datameet+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to datameet+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.