Hi All, I am always looking for a comprehensive list of GOI websites in a consumable manner for various projects. Hence I decided to scrape http://goidirectory.nic.in/index.php. (YES! There is not HTTPS for this link).
I have dumped a list of websites: https://raw.githubusercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.json *Number of Websites:* 10741 Suffix Count .gov.in 4805 .nic.in 2766 .org 855 .com 566 .ac.in 499 .in 485 .co.in 209 .org.in 176 .res.in 158 .edu.in 110 .net 37 .edu 26 .net_in 9 .info 7 .aero 2 .gen_in 1 .coop 1 Hope this list is useful for quite some projects / studies. Please feel free to add missing domains, or other information which would be relevant, the working repo is: https://github.com/konarkmodi/DigitalIndia -Konark @konarkmodi -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.