Hi Mark,

On Tue, 22 Apr 2025, 4:00 am Mark Wielaard, <m...@klomp.org> wrote:

> Hi hackers,
>
> TLDR; When using https://patchwork.sourceware.org or Bunsen
> https://builder.sourceware.org/testruns/ you might now have to enable
> javascript. This should not impact any scripts, just browsers (or bots
> pretending to be browsers). If it does cause trouble, please let us
> know. If this works out we might also "protect" bugzilla, gitweb,
> cgit, and the wikis this way.
>
> We don't like to hav to do this, but as some of you might have noticed
> Sourceware has been fighting the new AI scraperbots since start of the
> year. We are not alone in this.
>
> https://lwn.net/Articles/1008897/
>
> https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/
>
> We have tried to isolate services more and block various ip-blocks
> that were abusing the servers. But that has helped only so much.
> Unfortunately the scraper bots are using lots of ip addresses
> (probably by installing "free" VPN services that use normal user
> connections as exit point) and pretending to be common
> browsers/agents.  We seem to have to make access to some services
> depend on solving a javascript challenge.
>
> So we have installed Anubis https://anubis.techaro.lol/ in front of
> patchwork and bunsen. This means that if you are using a browser that
> identifies as Mozilla or Opera in their User-Agent you will get a
> brief page showing the happy anime girl that requires javascript to
> solve a challenge and get a cookie to get through. Scripts and search
> engines should get through without. Also removing Mozilla and/or Opera
> from your User-Agent will get you through without javascript.
>
> We want to thanks Xe Iaso who has helped us set this up and worked
> with use over the Easter weekend solving some of our problems/typos.
> Please check out if you want to be one of their patrons as thank you.
> https://xeiaso.net/notes/2025/anubis-works/
> https://xeiaso.net/patrons/


Ah that might explain a few things. We've seen sporadic failures in the
crosstool-ng CI builds (run via a github action) where a download of the
newlib snapshot failed (but worked fine when I tried the download manually).

The good news is that this finally prompted me to look at why we were
downloading something that should have been cached. I've fixed that now so
whatever extra load our builds were contributing should stop soon.

We might still get caught up in the bot detection when a package hosted on
sourceware.org is updated. I'm not sure if there is anything we can do
about that. I totally understand why this is necessary (AI scraper bots
have taken the crosstool-ng website down twice).

Reply via email to