I ended up writing a bash script that looked at all the ips hitting the server, determined which ones belonged to yandex, and blocked them in the firewall.
We tried robots.txt and .htaccess rules. They never looked at robots.txt, and not all the user agents matched yandex, so we saw over 600 connects from thier ips alone, which effectively ddos'd the server, as thier were no connections left for Apache. Now after blocking yandex, we are seeing an average of about 40 connections total to the web server at any given time. Much more reasonable. I have never worked with Hiawatha before, though I have heard of it. The problem with swapping to it is we use cPanel and we have other people on the server, so I don't know how it would play with those two things. Ideally I would like to switch to Litespeed, but that is an additional cost monthly, and honestly the 2/3 accounts that have signed up over the life promoting on HPR don't cover the out of pocket expenses. I'm not complaining, I do it because I want to support the community, but it does mean that I have to cap resources like connection speed and it also means I need to watch my spend on added things. (So far Corey hasn't said anything, but I would like not to give him a reason.) Please forgive any spelling or grammer errors. I typed this on my phone while feeding my doggos and I have yet to fully wake up. --Josh On Sep 6, 2017 5:41 AM, "Claes Wallin (韋嘉誠)" < [email protected]> wrote: I listened to twit.tv/floss448 today, about the Hiawatha web server. The author mentioned having simple built-in options for deprioritizing or blocking misbehaving clients. I don't know if it would have helped against a looping spider, but from that general description it sounds like it. I don't know what the site needs beyond that, just wanted to mention it as it was such a coincidence. -- /c On Sep 6, 2017 16:09, "Ken Fallon" <[email protected]> wrote: > On 2017-09-04 23:29, Mike Ray wrote: > > Hello > > > > It took me over an hour to download last night's community news. > > > > I know the community news is usually a long one but the download time is > > longer than it should be, and has been for several weeks. > > > > I guess insomnia means I may be alone in pressing the download button > > when the clock ticks past 01:00 daylight saving time here in the UK so > > others may not have noticed. > > > > I am guessing somebody is grabbing the whole lot each night. > > > > Mike > > > > Hi All, > > We (ok Josh who is the real hero) found out what is going on. It turns > out that a badly written bot was stuck in a recursive loop on the site. > > https://en.wikipedia.org/wiki/Yandex > > http://www.webhostingtalk.com/showthread.php?t=924727 > > -- > Regards, > > Ken Fallon > http://kenfallon.com > http://hackerpublicradio.org/correspondents.php?hostid=30 > > > _______________________________________________ > Hpr mailing list > [email protected] > http://hackerpublicradio.org/mailman/listinfo/hpr_hackerpublicradio.org > > _______________________________________________ Hpr mailing list [email protected] http://hackerpublicradio.org/mailman/listinfo/hpr_hackerpublicradio.org
_______________________________________________ Hpr mailing list [email protected] http://hackerpublicradio.org/mailman/listinfo/hpr_hackerpublicradio.org
