The way I look at it, for my personal site and content, 1. I'm not going to win the arms race. It's not my area of expertise. I can't put hours into devising anti-bot countermeasures.
2. If I do try to implement countermeasures to prevent bots, I will likely also end up impacting/inconveniencing some legitimate users. 3. Ultimately my goal is to help people, so if my content ends up training an AI model,. and that model ends up helping people, then I'm indirectly meeting my goal. 4. Many of the AIs are now citing their sources, and that means I get some level of attribution and recognition. 5. Some archivals, such as wayback machine, I find to be extremely useful to vintage computer research. People die. Providers shut down. A lot of knowledge has been lost. I'll be happy if my content eventually outlives me. I wish there would be more focus on (4). Everyone deserves recognition of their work and their content. I'd support legislation to require that sources are cited/acknowledged when AI results are returned. I think there's some risk of "content laundering", i.e. a bot is trained from your content, someone publishes an AI-generated article, and the next bot is trained from that AI-generated content, losing the original attribution. Without discipline, it can turn into a bunch of slop that nobody knows where it came from, or the accuracy of the information. Scott On Wed, Sep 17, 2025 at 11:31 AM Bill Degnan via cctalk < [email protected]> wrote: > On Wed, Sep 17, 2025 at 1:27 PM The Doctor via cctalk < > [email protected]> > wrote: > > > On Tuesday, September 16th, 2025 at 17:01, Bill Degnan via cctalk < > > [email protected]> wrote: > > > > > I wonder how long the WWW will remain open, it would be a bummer if I > > found > > > copies of my site elsewhere. > > > > I've been thinking about this myself. It does not please me. > > > > What web server do you use for your site? I've got some pretty robust > but > > easy to admin > > countermeasures set up on my own website that I'd be happy to share if > > there is interest. > > > > > > > I run a web services company, vintagecomputer.net is internally-supported. > vintagecomputer.net has been dealing with some sort of scrapers for 20 > years. The site is privately hosted and has web scraping control measures, > built to detect a whole array of bot activity. Rather than block, I > believe it's better to detect and log, and then determine how best to > manage new types of bot probing and scraping on an ongoing basis, it's a > great way to learn white hat hacking. >
