Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models

Wouter Verhelst Sun, 04 May 2025 04:12:17 -0700

On Tue, Apr 29, 2025 at 03:17:52PM +0200, Aigars Mahinovs wrote:
>    However, here we have a clear and fundamental change happening in the
>    copyright law level - there is a legal break/firewall that is happening
>    during training. The model *is* a derivative work of the source code of
>    the training software, but is *not* a derivative work of the training
>    data.


I would disagree with this statement. How is a model not a derivative
work of the training data? Wikipedia defines it as

  In copyright law, a derivative work is an expressive creation that
  includes major copyrightable elements of a first, previously created
  original work (the underlying work). [1]

Which, as models are often able to regurgitate copyrighted works
(largely) verbatim, is to me a definition that applies to models.

[1] https://en.wikipedia.org/wiki/Derivative_work

>    This means that we also have to consider what exactly is training
>    data and how to deal with it, without automatically falling back to
>    equating it with source code.

We have a very wide definition of "source code" in Debian. To us, source
code is not limited to software written in a common programming
language; instead, our definition considers various things such as SVG
files, libreoffice documents, gimp XCF files, etc, to be source code
too. In this context, I don't think that equating training data to
source code is too wild a thing to do.

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.

Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (AI) Models

Reply via email to