>
> How's that AI's fault when your software was always inefficient at scale?

The "recent" change is that the new bots are not caching their
results, they are not looking at robots.txt, they are asking for the
same set of "all possible links" which is murder for any of the "show
diffs" type of websites if you let them roam. This wasn't a problem
(or at least not of the same magnitude) with the search engine bots,
they knew that if the webserver sent a header that said "this hasn't
changed" they would not collect all the same data again. The new bots
do. This is a huge change for the worse, c10k or not.

> The way to solve it is multifold:

While this is not strictly aimed at you or this particular reply, but
for anyone in this thread that has a solution to suggest, PLEASE set
up something where you present tons of links that cause load, doesn't
matter if it is cvsweb, github/igtlanb/gitea clone, gotweb or
something else, and show the before-and-after your suggestion is
applied.

It is fine to have people suggest solutions, but can we have at least
a bit of science in here and not strictly armchair experts debating if
relayd or pf limits is the right choice. It should be totally "easy"
to put a service up and apply the proposals and see if the thousands
of controlled scrapers are handled or not by it, so its usable for
normal people but not getting hit over the head from the scraping
drones.

-- 
May the most significant bit of your life be positive.

Reply via email to