On 5/6/26 03:44, Nguyễn Gia Phong via Development of GNU Guix and the GNU System distribution. wrote:
Hi Greg,

On 2026-05-05 at 15:04-04:00, Greg Hogan wrote:
I would be interested to see an analysis of incidental memoization
rather than the claims of extractive memoization typically presented.
And taking into consideration the age of the models.

Same here.

The "open-slopware"[1] list that Ian shared earlier has a reference to a recent (March) article[2] where they get LLM's to output verbatim text that goes a bit further than earlier research. It was the most interesting part of that list for me.

First they fine-tune an LLM to produce verbatim texts from book abstracts. This is not surprising, because the text they ask it to produce is the text that was used to generate the prompt. But then they also claim that, once fine tuned, they can trigger the LLM to output verbatim text of books that were not used in the fine-tuning. (Because those books are somehow 'clustered' in the weights.)

Their main figure contains a graph with "Longest Contiguous Regurgitated Span" and that is at 20 words before their fine-tuning and at around 400 words after fine-tuning. So still far from incidental extraction, but interesting nonetheless.

The way it is written makes me believe it is still not 'loophole free', because I understand that the summary needed to extract the verbatim text is derived from the original, even for the books not used in the fine-tuning (if I understand correctly). So you'd still need to have the original text to begin with. (Also to get those 20 words, I think.)

The article is titled "Whack-a-Mole", but I find that silly w.r.t. to our context. For this discussion (incorporating LLM code in packaged software or in Guix itself) it should be enough to make it hard (impossible?) to extract copyrighted works on accident. I'd say it is not problematic (for those purposes) if it is possible to extract copyrighted works deliberately.

Hugo

[1] https://codeberg.org/small-hack/open-slopware
[2] https://arxiv.org/html/2603.20957v2


  • Re: Package Updat... Development of GNU Guix and the GNU System distribution.
    • Re: Package ... Yarl
    • Re: Package ... Development of GNU Guix and the GNU System distribution.
      • Re: Pack... Development of GNU Guix and the GNU System distribution.
    • Re: Package ... Greg Hogan
      • Re: Pack... Development of GNU Guix and the GNU System distribution.
        • Re: ... Greg Hogan
          • ... Development of GNU Guix and the GNU System distribution.
            • ... Greg Hogan
            • ... Development of GNU Guix and the GNU System distribution.
            • ... Development of GNU Guix and the GNU System distribution.
            • ... Development of GNU Guix and the GNU System distribution.
            • ... Tomas Volf
            • ... Greg Hogan
            • ... Ludovic Courtès
            • ... Greg Hogan
            • ... Ludovic Courtès
            • ... Greg Hogan
            • ... Pjotr Prins
            • ... pelzflorian (Florian Pelz)
      • Re: Pack... Development of GNU Guix and the GNU System distribution.

Reply via email to