I found a couple of really suspicious numbers in my solr stats, aka lots of 
entries were marked as isBot=false although the probably should has been 
isBot=true.

In the config file  I use

spiderips.urls = http://iplists.com/google.txt, \
                 http://iplists.com/inktomi.txt, \
                 http://iplists.com/lycos.txt, \
                 http://iplists.com/infoseek.txt, \
                 http://iplists.com/altavista.txt, \
                 http://iplists.com/excite.txt, \
                 http://iplists.com/northernlight.txt, \
                 http://iplists.com/misc.txt, \
                 http://iplists.com/non_engines.txt


I could not find downloadable lists for Bing, Baidu, Yahoo.
The best I saw was:   
http://myip.ms/info/bots/Google_Bing_Yahoo_Facebook_etc_Bot_IP_Addresses.html
Is that reliable  ?

Does anybody out there have lists / sources that they can share ?

Also: does the dspace code gracefully deal with IP address patterns ?

Monika

________________
Monika Mevenkamp
phone: 609-258-4161
Princeton University, Princeton, NJ 08544


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to