That's probably bingbot, it's aggressive on our wiki too.

A few days I added the following to robots.txt:

User-agent: bingbot
Crawl-delay: 1

I havent checked yet if it does make a difference.

Also, have a look at
http://www.bing.com/webmaster/help/how-to-report-an-issue-with-bingbot-25c19802

Greetings
Stip

Am 01.04.2013 17:45, schrieb Jan Steinman:
I periodically experience a DDoS attack from Microsoft. It appears to be their search engines, 
although I guess a bot network could be messing with reverse DNS. The attacks come from names like 
"msnbot-nn-nn-nn-nn.search.msn.com", where "nn" are byte values in the IP 
address. There will be a dozen or more crawling my site at the same time.

The symptom is that these guys are so hot and heavy that the number of httpd 
instances shoots through the roof to the point that none of them get serviced 
before timing out.

I've complained to Microsoft, but of course, received no answer.

So why am I complaining here? The logs show that this is only happening to 
MediaWiki sites I host -- other, simpler sites don't seem to act like a tar pit.

I've tried adding ipfilter(8) blocks, but then they just pop up on some other subnet. 
Also, I don't want to block legit traffic coming from Microsoft. I also don't want to 
stop spidering via "robots.txt" because I want well-behaved search engines like 
Google to have access.

Anyone else seen this aggressive crawling of their wiki sites? Any ideas for 
fixing it?

Thanks for any advice offered!

----------------
:::: The way you see people is the way you treat them. -- Zig Ziglar
:::: Jan Steinman, EcoReality Co-op ::::





_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l


_______________________________________________
MediaWiki-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

Reply via email to