[cctalk] Re: Large language model (LLM) Web Scrapers

Doug McIntyre via cctalk Wed, 17 Sep 2025 07:51:55 -0700

On Wed, Sep 17, 2025 at 09:33:25AM -0400, Paul Koning via cctalk wrote:
> A web crawler that does not obey robots.txt is not a law abiding outfit.  
> Best would be to block it entirely.  If they are that dismissive of honesty, 
> they are also unlikely to pay attention to such matters as copyright and 
> intellectual property ownership.


So, you want to block the whole of the Internet, including every AI company 
that all ignore robots.txt? 

All of the AI companies have been already sued by the book publishers for 
outright pirating
all of Z-Lib and Scilib. Several have settled. 
Meta admited, we only torrented the books, not shared them. (not how that 
works). 

Besides Cloudflare (which has a vested interest in this already), the
AI constant scraping has prompted solutions such as https://anubis.techaro.lol/
forcing browsers to do proof-of-work to connect to websites to protect their 
content.

[cctalk] Re: Large language model (LLM) Web Scrapers

Reply via email to