Hi Phong,

On 6/8/26 10:59, Nguyễn Gia Phong wrote:
Hi Hugo,

I'm redirecting the GCD 8 PR comment thread here because
linear conversation and talking stick does not work in long form:
https://codeberg.org/guix/guix-consensus-documents/pulls/13#issuecomment-16813481

True; thank you for your diligent work in finding these examples!

Note that I'm OK with erring on the side of caution w.r.t. the copyright. My main issue is that we should do that based on factual evidence and not FUD, and you have provided some evidence; thanks.

I personally believe that for all of these examples it is still possible to use these tools without violating copyright (by performing some due diligence). It seems the goal of these lawsuits is the opposite, to demonstrate that it is possible to use these tools to violate copyright; a premise I'm already willing to accept.

Nevertheless, these examples are a great step forward to what I was looking for. I'll read them in more detail and do some investigation myself.

(The examples are quite old though; the document is from January 2024, so the examples are even older, and the 2.5 year makes a huge difference.)

Hugo


On 2026-05-12 at 14:40+02:00, Hugo Buddelmeijer wrote:
[LLM] realizes the request is for a "classic function body"
and returns that.  In order to accidentally end up
with problematic code, this needs to happen:

- The programmer unknowingly made a reference to problematic code,
   e.g.  the programmer coincidentally selected the same variable
   names as John Carmack.

The space for indentifier name is not large, and it should not
be unlikely for functions doing similar things in different codebases
to have similar to identical names.

On 2026-05-12 at 14:40+02:00, Hugo Buddelmeijer wrote:
- [...] LLM to figure out [...]
- [...] LLM decides [...]

I'd refrain from implying deduction capability of language models.
See also: https://openreview.net/forum?id=pMhTFUdM4G

On 2026-05-12 at 14:40+02:00, Hugo Buddelmeijer wrote:
- The code is indeed problematic, e.g. proprietary.

Free software licenses need to be complied against too,
and we need to know which snippet it's from for that to happen.
Quake's fast invert square root is extremely popular,
hence it's statistically likely to show up.  For something
with less competition, it's less clear if an LLM can return
the parent project with high accuracy.

On 2026-05-12 at 14:40+02:00, Hugo Buddelmeijer wrote:
- [...] safeguards from the LLM [service] do not flag this prompt
   to deliberately/accidentally get copyrighted code

but what is the coverage of the guardrail's heuristics
(which is likely implemented as pattern matching on natural language,
like what discovered in the recent Anthropic source leak)?

On 2026-06-08 at 08:47+02:00, Hugo Buddelmeijer wrote:
Section II.D.2 (Training/Memorization)

The only argument seems to be that the training data is "in" the model

In citation 117,

OpenAI Reply Comments at 9 n.23 (explaining that pre-trained
language models can, “on rare occasions, ‘memorize’ training data
such that it may output a verbatim excerpt of that data
when prompted with a different portion of that data.

On 2026-05-12 at 14:40+02:00, Hugo Buddelmeijer wrote:
It seems the only way to use genAI badly is to prompt them
so explicitly that the intent clearly is to get copyrighted
material out.  I don't think we need a specific pledge
to not ask genAI for copyrighted material.

See githubcopilotlitigation.com, or
https://www.courtlistener.com/docket/65669506/doe-1-v-github-inc/

The second amended complaint (200) is sadly heavily redacted,
but please pay some attention over 60-74, 84-87, 114,
155-156 within that document.

114 might prove your point of a prompt with _bad intent_,
though with all what we know, it could have been

def __init__(self, string, number):
     self.string, self.number = string, number

Kind regards,
Phong


            • ... André Batista
            • ... pelzflorian (Florian Pelz)
            • ... Thanos Apollo
            • ... pelzflorian (Florian Pelz)
            • ... André Batista
            • ... pinoaffe
            • ... André Batista
            • ... pelzflorian (Florian Pelz)
            • ... Development of GNU Guix and the GNU System distribution.
            • ... Development of GNU Guix and the GNU System distribution.
            • ... Development of GNU Guix and the GNU System distribution.
          • ... Development of GNU Guix and the GNU System distribution.
            • ... Ian Eure
            • ... Development of GNU Guix and the GNU System distribution.
            • ... Andreas Enge
            • ... pelzflorian (Florian Pelz)
            • ... pelzflorian (Florian Pelz)
          • ... Development of GNU Guix and the GNU System distribution.
      • Re: Pack... bokr
  • Re: Package Updat... Janneke Nieuwenhuizen

Reply via email to