Hello,
I’m following up on this since discussion since it’s been a month
and I haven’t heard any updates.
Summarizing the situation:
- SHF has an opaque, difficult, and undocumented process for
handling name changes. I’s like to stress again that this is
*not* strictly a transgender issue (though it likely affects
them more, or in worse/different ways) -- it is a human respect
issue. Many, many more cisgender people change their name than
transgender people.
- SHF gave their archive to HuggingFace, an "AI" company which is
generating derived works with no attribution or provenance, in
ways which violate the both licenses of the projects used to
train their model, and the SHF principles for LLMs.
- HuggingFace wasn’t respecting requests to opt-out of their
model.
On the first point, it sounds like SHF has made concrete progress
to improve[1], which is very good to hear. If SHF continues on
this course, I think the concern is resolved.
On the third point, HuggingFace has begun honoring opt-out
requests, but is still very far behind. Also, they don’t remove
code from the older versions of their model -- it remains there
forever. This is progress, but still, not great.
On the second point, I have not seen any public statements
indicating that either SHF or HuggingFace even acknowledges the
problem. SHF’s most recent newsletter[2], published in April 2024
(after these concerns came to light), continues to tout that
StarCoder2 is "the first AI model aligned with our principles,"
which appears to be false. StarCoder2 includes both licensed and
unlicensed code, and HuggingFace’s own StarChat2 playground
produces works derivative of this code, with no attribution or
licensing information. There is also no statement or position on
the SHF news blog. Nor hsa HuggingFace either fixed their tools,
or made a statement. This is still very much a live concern.
I have a few questions:
- Has Guix reached out to SHF to express these concerns / get a
response?
- Whether a public or private response, what would Guix consider
to be an acceptable response? An unacceptable respoinse?
- How long is Guix willing to wait for a response?
Thanks,
— Ian
[1]:
https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]:
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf
Ian Eure <i...@retrospec.tv> writes:
Hi Guixy people,
I’d never heard of SWH before I started hacking on Guix last
fall, and
it struck me as rather a good idea. However, I’ve seen some
things
lately which have soured me on them.
They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
I was also distressed to see how poorly they treated a developer
who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag
GPL’d software I’ve created has been packaged for Guix, which I
assume
means it’s been included in SWH. While I’m dealing with their
(IMO:
unethical) opt-out process, I likely also need to stop new
copies from
being uploaded again in the future.
Is there a way to indicate, in a Guix package, that it should
*never*
be included in SWH?
Is there a way to tell Guix to never download source from SWH?
I want absolutely nothing to do with them.
Thanks,
— Ian