Re: [nexa] AI Training is Copyright Infringement

Fabio Alemagna Mon, 09 Sep 2024 03:45:28 -0700

Quando un essere umano studia da un libro, più libri, apprende non solo il
senso dei contenuti, e non solo è in grado di fare connessioni tra ciò che
ha appreso dai singoli libri, ma il più delle volte può anche recitare
interi passaggi di quei libri, sia essendone consapevole - quando voglia
effettuare una citazione - sia a volte in modo involontario, semplicemente
perché è ciò che gli viene naturale fare nel ricordare ciò che ha appreso.


Dal punto di vista semantico, non vi è alcuna differenza sostanziale con
ciò che fa un LLM, e onestamente trovo che non ci sia alcuna
antropomorfizzazione nell'usare termini come "apprendere" per riferirsi ad
un algoritmo, per due ordini di motivi:

1) Non solo gli umani apprendono, bensì tutti gli esseri viventi.
2) Esistono definizioni di "apprendimento" che sono basate puramente sui
concetti derivati dalla teoria dell'informazione (o, equivalentemente,
dalla termodinamica statistica) e non richiedono affatto che ad apprendere
sia un essere vivente, men che meno senziente, men che meno umano.

La domanda conseguente dunque è: un essere umano viola il copyright
apprendendo dai testi da cui studia?

La risposta è "no", ergo non vi è ragione alcuna di sostenere  che
qualunque altro soggetto che effettui un apprendimento stia violando il
copyright, a meno di non voler espandere il concetto di copyright fino a
coprire ambiti che fino ad ora gli erano preclusi.

A tal proposito, questo è quel che ne pensa Creative Commons:

«this method of using image-text combinations to train the AI model has an
inherently transformative purpose from the original images and should
support a finding of fair use. While these images were originally created
for their aesthetic value, their purpose for the AI model is only as data.
For the AI, these image-text pairs are only representations of how text and
images relate. What the images are does not matter for the model — they are
only data to teach the model about statistical relationships between
elements of the images and not pieces of art.»

«This is similar to how Google used digital copies of print books to create
Google Books, a practice that was challenged in Author’s Guild v. Google
(Google Books). In this case, the Second Circuit Court of Appeals found
that Google’s act of digitizing and storing copies of thousands of print
books to create a text searchable database was fair use. The court wrote
that Google’s purpose was different from the purpose of the original
authors because Google was not using the books for their content. Indeed,
the content did not really matter to Google; rather the books were like
pieces of data that were necessary to build Google’s book database.»

« it is also similar to how search engine operator Arriba Soft used copies
of images in its search engine, which was litigated in Kelly v. Arriba
Soft. In this case, a photographer, Leslie Kelly, sued the operator of a
search engine, Arriba Soft, for copying and displaying copies of her
photographs as thumbnails to users. The court, however, disagreed that this
constituted copyright infringement. Instead, the court held that this use
served a different and transformative purpose from the original purpose
because Arriba Soft only copied Kelly’s photographs to enable its search
engine to function and not because of their aesthetic value.»

https://creativecommons.org/2023/02/17/fair-use-training-generative-ai/

Fabio

Re: [nexa] AI Training is Copyright Infringement

Reply via email to