> > How's that AI's fault when your software was always inefficient at scale?
The "recent" change is that the new bots are not caching their results, they are not looking at robots.txt, they are asking for the same set of "all possible links" which is murder for any of the "show diffs" type of websites if you let them roam. This wasn't a problem (or at least not of the same magnitude) with the search engine bots, they knew that if the webserver sent a header that said "this hasn't changed" they would not collect all the same data again. The new bots do. This is a huge change for the worse, c10k or not. > The way to solve it is multifold: While this is not strictly aimed at you or this particular reply, but for anyone in this thread that has a solution to suggest, PLEASE set up something where you present tons of links that cause load, doesn't matter if it is cvsweb, github/igtlanb/gitea clone, gotweb or something else, and show the before-and-after your suggestion is applied. It is fine to have people suggest solutions, but can we have at least a bit of science in here and not strictly armchair experts debating if relayd or pf limits is the right choice. It should be totally "easy" to put a service up and apply the proposals and see if the thousands of controlled scrapers are handled or not by it, so its usable for normal people but not getting hit over the head from the scraping drones. -- May the most significant bit of your life be positive.

