Ian Jackson <ijack...@chiark.greenend.org.uk> writes:
> Lumin writes:

>>  1. Is GPL-licended pretrained neural network REALLY FREE? Is it really
>>     DFSG-compatible?

> No.  No.

> Things in Debian main shoudl be buildable *from source* using Debian
> main.  In the case of a pretrained neural network, the source code is
> the training data.

> In fact, they are probably not redistributable unless all the training
> data is supplied, since the GPL's definition of "source code" is the
> "preferred form for modification".  For a pretrained neural network
> that is the training data.

I'm not sure I disagree, but I think it's worth poking at this a bit and
seeing if it holds up in extension by analogy to other precomputed data.

Suppose that NASA releases a database of features of astronomical objects,
described as a bunch numeric parameters.  Suppose that database is
released in the public domain or some other obviously free license
allowing use, derivative works, and everything else we expect.  However,
suppose that data is derived (using significant computational resources
and analysis code) from huge databases of observational data.  Suppose
that observational data is *largely* released, but it's not at all clear
what pieces were used to build the database of features, and the code to
do so wasn't released alongside the database.

Is that database DFSG-compatible and something we could package?

(I could see a possible argument that it is but a neural network still
isn't if we think about the NASA database as facts about the physical
world, however derived, and the neural network as a program.  But I think
that's an interestingly murky distinction.)

Similar analogies may come up with databases about genome sequences.  For
a lot of scientific data, reproducing a result data set is not trivial and
the concept of "source" is pretty murky.

Russ Allbery (r...@debian.org)               <http://www.eyrie.org/~eagle/>

Reply via email to