It's not just Evergreen sites. I had to block all traffic from Hong Kong to our system website after we had a greater than 10x increase in visitors overnight. I tried doing it by IP, but they just changed, so it ended up just being easier to just block everything.
Shula Link (she/her) Systems Services Librarian Greater Clarks Hill Regional Library sl...@columbiacountyga.gov | sl...@gchrl.org 706-447-6702 On Thu, Feb 13, 2025 at 4:46 PM Blake Graham-Henderson via Evergreen-dev < evergreen-dev@list.evergreen-ils.org> wrote: > All, > > I almost replied with the arstechnica article that Josh linked when the > thread was started. But I decided not to put it out there until I had setup > a test system to see if I could get that code working. A tarpit, I think, > serves them right. And, of course, the whole issue is destined to receive > the fate of spam and spam filters forever and ever. > > It was a serendipitous timed article. It's existence at this moment in > time signals to me that this isn't a "just us" problem. It's the entire > planet. > > -Blake- > Conducting Magic > Will consume any data format > MOBIUS > > > On 2/13/2025 3:10 PM, Josh Stompro via Evergreen-dev wrote: > > Jeff, thanks for bringing this up on the list. > > We are seeing a lot of requests like > "GET /eg/opac/mylist/delete?anchor=record_184821&record=184821" from > never seen before IPs, and they make 1-12 requests and then stop. > > And they seem like they usually have a random out of date chrome version > in the user agent string. > Chrome/88.0.4324.192 > Chrome/86.0.4240.75 > > I've been trying to slow down the bots by collecting logs and grabbing all > the obvious patterns and blocking netblocks for non US ranges. ipinfo.io > offers a free country & ASN database download that I've been using to look > up the ranges and countries. (https://ipinfo.io/products/free-ip-database) > I would be happy to share a link to our current blocklist that has 10K non > US ranges. > > I've also been reporting the non US bot activity to > https://www.abuseipdb.com/ just to bring some visibility to these bad > bots. I noticed initially that many of the IPs that we were getting hit > from didn't seem to be listed on any blocklists already, so I figured some > reporting might help. I'm kind of curious if Evergreen sites are getting > hit from the same IPs, so an evergreen specific blocklist would be useful. > If you look up your bot IPs on abuseipdb.com you can see if I've already > reported any of them. > > I've also been making use of block lists from https://iplists.firehol.org/ > Such as > https://iplists.firehol.org/files/cleantalk_30d.ipset > https://iplists.firehol.org/files/botscout_7d.ipset > https://iplists.firehol.org/files/firehol_abusers_1d.netset > > We are using HAProxy so I did some looking into the CrowdSec HAProxy > Bouncer (https://docs.crowdsec.net/u/bouncers/haproxy/) but I'm not sure > that would help since these IPs don't seem to be on blocklists. But I may > just not quite understand how CrowdSec is supposed to work. > > HAProxy Enterprise has a ReCaptcha module that I think would allow us to > feed any non-us connections that haven't connected before through a > recaptcha, but the price for HAProxy Enterprise is out of our budget. > https://www.haproxy.com/blog/announcing-haproxy-enterprise-3-0#new-captcha-and-saml-modules > > There is also a fairly up to date project for adding Captchas through > haproxy at > https://github.com/ndbiaw/haproxy-protection, This looks promising as a > transparent method, requires new connections to perform a javascript proof > of work calculation before allowing access. Could be a good transparent > way of handling it. > > We were taken out by ChatGTP bots back in December, which were a bit > easier to block the netblocks since they were not as spread out. I > recently saw this article about how some people are fighting back against > bots that ignore robots.txt, > https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/ > > Josh > > On Mon, Jan 27, 2025 at 6:33 PM Jeff Davis via Evergreen-dev < > evergreen-dev@list.evergreen-ils.org> wrote: > >> Hi folks, >> >> Our Evergreen environment has been experiencing a higher-than-usual >> volume of unwanted bot traffic in recent months. Much of this traffic looks >> like webcrawlers hitting Evergreen-specific URLs from an enormous number of >> different IP addresses. Judging from discussion in IRC last week, it sounds >> like other EG admins have been seeing the same thing. Does anyone have any >> recommendations for managing this traffic and mitigating its impact? >> >> Some solutions that have been suggested/implemented so far: >> - Geoblocking entire countries. >> - Using Cloudflare's proxy service. There's some trickiness in getting >> this to work with Evergreen. >> - Putting certain OPAC pages behind a captcha. >> - Deploying publicly-available blocklists of "bad bot" >> IPs/useragents/etc. (good but limited, and not EG-specific). >> - Teaching EG to identify and deal with bot traffic itself (but arguably >> this should happen before the traffic hits Evergreen). >> >> My organization is currently evaluating CrowdSec as another possible >> solution. Any opinions on any of these approaches? >> -- >> Jeff Davis >> BC Libraries Cooperative >> _______________________________________________ >> Evergreen-dev mailing list >> Evergreen-dev@list.evergreen-ils.org >> http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev >> > > _______________________________________________ > Evergreen-dev mailing > listEvergreen-dev@list.evergreen-ils.orghttp://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev > > > _______________________________________________ > Evergreen-dev mailing list > Evergreen-dev@list.evergreen-ils.org > http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev >
_______________________________________________ Evergreen-dev mailing list Evergreen-dev@list.evergreen-ils.org http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev