Hi All,

I am always looking for a comprehensive list of GOI websites in a
consumable manner for various projects. Hence I decided to scrape
http://goidirectory.nic.in/index.php. (YES! There is not HTTPS for this
link).

I have dumped a list of websites:
https://raw.githubusercontent.com/konarkmodi/DigitalIndia/master/data/list_domains.json

*Number of Websites:* 10741
Suffix Count
.gov.in 4805
.nic.in 2766
.org 855
.com 566
.ac.in 499
.in 485
.co.in 209
.org.in 176
.res.in 158
.edu.in 110
.net 37
.edu 26
.net_in 9
.info 7
.aero 2
.gen_in 1
.coop 1


Hope this list is useful for quite some projects / studies.

Please feel free to add missing domains, or other information which would
be relevant, the working repo is: https://github.com/konarkmodi/DigitalIndia


-Konark
@konarkmodi

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to