Re: [Evergreen-dev] Problematic bot traffic

Blake Graham-Henderson via Evergreen-dev Thu, 13 Feb 2025 13:46:55 -0800

All,

I almost replied with the arstechnica article that Josh linked when thethread was started. But I decided not to put it out there until I hadsetup a test system to see if I could get that code working. A tarpit, Ithink, serves them right. And, of course, the whole issue is destined toreceive the fate of spam and spam filters forever and ever.

It was a serendipitous timed article. It's existence at this moment intime signals to me that this isn't a "just us" problem. It's the entireplanet.


-Blake-
Conducting Magic
Will consume any data format
MOBIUS

On 2/13/2025 3:10 PM, Josh Stompro via Evergreen-dev wrote:

Jeff, thanks for bringing this up on the list.

We are seeing a lot of requests like
"GET /eg/opac/mylist/delete?anchor=record_184821&record=184821" fromnever seen before IPs, and they make 1-12 requests and then stop.
And they seem like they usually have a random out of date chromeversion in the user agent string.
Chrome/88.0.4324.192
Chrome/86.0.4240.75
I've been trying to slow down the bots by collecting logs and grabbingall the obvious patterns and blocking netblocks for non US ranges.ipinfo.io <http://ipinfo.io> offers a free country & ASN databasedownload that I've been using to look up the ranges and countries.(https://ipinfo.io/products/free-ip-database) I would be happy toshare a link to our current blocklist that has 10K non US ranges.
I've also been reporting the non US bot activity tohttps://www.abuseipdb.com/ just to bring some visibility to these badbots. I noticed initially that many of the IPs that we were gettinghit from didn't seem to be listed on any blocklists already, so Ifigured some reporting might help. I'm kind of curious if Evergreensites are getting hit from the same IPs, so an evergreen specificblocklist would be useful. If you look up your bot IPs onabuseipdb.com <http://abuseipdb.com> you can see if I've alreadyreported any of them.
I've also been making use of block lists from https://iplists.firehol.org/
Such as
https://iplists.firehol.org/files/cleantalk_30d.ipset
https://iplists.firehol.org/files/botscout_7d.ipset
https://iplists.firehol.org/files/firehol_abusers_1d.netset
We are using HAProxy so I did some looking into the CrowdSec HAProxyBouncer (https://docs.crowdsec.net/u/bouncers/haproxy/) but I'm notsure that would help since these IPs don't seem to be on blocklists. But I may just not quite understand how CrowdSec is supposed to work.
HAProxy Enterprise has a ReCaptcha module that I think would allow usto feed any non-us connections that haven't connected before through arecaptcha, but the price for HAProxy Enterprise is out of our budget.https://www.haproxy.com/blog/announcing-haproxy-enterprise-3-0#new-captcha-and-saml-modules
There is also a fairly up to date project for adding Captchas throughhaproxy athttps://github.com/ndbiaw/haproxy-protection, This looks promising asa transparent method, requires new connections to perform a javascriptproof of work calculation before allowing access. Could be a goodtransparent way of handling it.
We were taken out by ChatGTP bots back in December, which were a biteasier to block the netblocks since they were not as spread out. Irecently saw this article about how some people are fighting backagainst bots that ignore robots.txt,https://arstechnica.com/tech-policy/2025/01/ai-haters-build-tarpits-to-trap-and-trick-ai-scrapers-that-ignore-robots-txt/
Josh
On Mon, Jan 27, 2025 at 6:33 PM Jeff Davis via Evergreen-dev<evergreen-dev@list.evergreen-ils.org> wrote:
    Hi folks,

    Our Evergreen environment has been experiencing a
    higher-than-usual volume of unwanted bot traffic in recent months.
    Much of this traffic looks like webcrawlers hitting
    Evergreen-specific URLs from an enormous number of different IP
    addresses. Judging from discussion in IRC last week, it sounds
    like other EG admins have been seeing the same thing. Does anyone
    have any recommendations for managing this traffic and mitigating
    its impact?

    Some solutions that have been suggested/implemented so far:
    - Geoblocking entire countries.
    - Using Cloudflare's proxy service. There's some trickiness in
    getting this to work with Evergreen.
    - Putting certain OPAC pages behind a captcha.
    - Deploying publicly-available blocklists of "bad bot"
    IPs/useragents/etc. (good but limited, and not EG-specific).
    - Teaching EG to identify and deal with bot traffic itself (but
    arguably this should happen before the traffic hits Evergreen).

    My organization is currently evaluating CrowdSec as another
    possible solution. Any opinions on any of these approaches?
--Jeff Davis
    BC Libraries Cooperative
    _______________________________________________
    Evergreen-dev mailing list
    Evergreen-dev@list.evergreen-ils.org
    http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev


_______________________________________________
Evergreen-dev mailing list
Evergreen-dev@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev

_______________________________________________
Evergreen-dev mailing list
Evergreen-dev@list.evergreen-ils.org
http://list.evergreen-ils.org/cgi-bin/mailman/listinfo/evergreen-dev

Re: [Evergreen-dev] Problematic bot traffic

Reply via email to