Eli Zaretskii <[email protected]> writes:

>> From: Philip Kaludercic <[email protected]>
>> Cc: [email protected],  [email protected],  [email protected],
>>   [email protected],  [email protected]
>> Date: Sat, 24 Jul 2021 07:41:21 +0000
>> 
>> > It cannot be a verbatim copy, because at least the variables, and
>> > sometimes also the data types, need to be renamed.  Whether the result
>> > is still under the original copyright cannot be established without
>> > actually comparing the two versions of the code.  So any general
>> > flat rejection of the idea of these services on these grounds is not
>> > serious, IMO.
>> 
>> Not necessarily, if it generates a pure, top-level function. Someone
>> could type something like "Sort list of postcodes" and it generates a
>> Radix Sort function. And if this is part of some code that was copied a
>> lot, the model might tend to generate this verbatim even more likely.
>
> A sort function must state at least the data type before it can be
> compiled.  And if you are talking about pseudo-code that is data-type
> agnostic, then that's an algorithm, and is not copyrightable, AFAIK.

No, I was thinking about concrete code, that depending on the language
might even just rely on the standard library, especially if the language
has generics. Seeing how often SO code has been found in random
repositories[0], I don't think it is improbable that the trained models
might notice these patterns.

[0] For example https://programming.guide/worlds-most-copied-so-snippet.html

-- 
        Philip Kaludercic

Reply via email to