On Tue, Jan 20, 2026 at 9:26 AM Svetlana Tkachenko <[email protected]> wrote: > > > but I suppose you need to compromise with LLM > > robots going wild. > > Are they not required to follow do_not_track http headers or robots.txt ? If > LLM robots do not obey these instructions, they should be probably reported > to their hosting provider.
Many sites do not honor Do Not Track (DNT) and Global Privacy Control (Sec-GPC). I have them set in my browser, and I still get most of the cookie warnings and Sharing Policy (err, Privacy Policy) violations. There are notable exceptions, though -- some sites will say something like "we got your Do Not Track signal, and we are limiting data collection." But they are few and far between. I do not know how well crawlers and bots comply with robots.txt or RFC 9309. An admin with actual experience should probably comment. Reporting to the hosting provider is usually a dead end. I used to try to contact netblock owners before blocking a range of IP addresses. Most of the contact information was missing or incorrect. The remaining just went unanswered. Of all the times I sent an email asking the administrative and technical contacts to contain their customer, I only received one actual response. Jeff

