Dixi quod… >I don’t expect to reply much (if I’m even allowed after this) here.
… but I think I have to make one addition. I don’t normally read the list (too much traffic, not enough spoons), but I wanted to see on the webinterface whether the mail made it through, and I saw the mail from peb. The legalities aspect (when not used as distraction or waved away) is a bit misrepresented. Yes, the TDM exception gives an exception to copyright for the training of models… to analyse things, for trends and the likes. Nowhere does this allow using the models to produce output. And please, do not use the word “generate”, they don’t generate (generative art is something entirely different and good), they regurgitate. LLMs are a sort of lossy compressor/decompressor, with the decompression attempting a best *average* match to continue the “prompt” (it’s really just autocomplete with sparks). https://explainextended.com/2023/12/31/happy-new-year-15/ demonstrated very nicely how they actually work, using an actually obtainable model as example. Incidentally, this is also why their output alone is not copyrightable as a new work: it is produced by a deterministic machine, not a human, and therefore does not pass threshold of originality… in two different ways, one in the legal meaning of that term, the other in the ordinary meaning of “originality”: there’s nothing new there, it merely r e g u r g i t a t e s from its inputs. (If the companies wouldn’t filter the possible prompts, it’d be easy to extract near-complete copies of individual “training data” by the millions, as studies have shown.) ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ ⚠ HOWEVER, this does not mean that their output is free from copyright. Rather, due to the above-mentioned properties (machine transformation of copyrighted works), the sum of all outputs from such a model is a derived work from all of its inputs (and for how much this is true for each individual combination of input and output of course depends on the prompt, PRNG seed and output in question). This does not, of course, give you carte blanche to just use *any* of its output… not even small ones. Citing rules do exist, after all. Especially the academics should know some… So. bye, //mirabilos -- "Using Lynx is like wearing a really good pair of shades: cuts out the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL." -- Henry Nelson, March 1999

