Michał Górny <[email protected]> writes:

> Hello,
>
> Recently we've seen quite a grow in LLM usage in upstream packages, to
> various degrees.  The exact degree differs, apparently ranging from
> using LLMs as glorified autocomplete to asking them to rewrite the whole
> project or being convinced that a chatbot is sentient.  The results are
> raising some concerns.  I've originally written to the distributions ml
> [1] to facilitate some feedback on how other distributions are handling
> it, but the major players didn't reply.
>
> Our "AI policy" [2] covers only direct contributions to Gentoo.  At the
> time, we did not consider it appropriate to start restricting what
> packages are added to Gentoo.  Still, general rules apply and some of
> the recent packages are starting to raise concerns there.  Hence I'm
> looking for your feedback.
>
> Two recent cases that impacted the Python team were autobahn and
> chardet.

I think there are cases where we can uncontroversially (*) want to mark
somehow, those are:
1) LLM-assisted rewrites for copyright reasons: clearly dubious
ethically and legally;
2) where the quality has gone substantially downhill, so legal issues
aside, it's not fair to our developers or our users (or upstreams of
*other* packages) to expose them to bugs from the rewrite.

Supposing we can agree that at least some packages require marking,
there's some things we want out of that:
1) users being able to avoid such packages if they wish;
2) preventing software from relying on it (at least in some, perhaps the
default, configuration(s));
3) avoiding people packaging tainted/broken versions.

For technical ways of achieving this marking:
1) LICENSE

  I think we need a way of having proper visibility-affected-by-LICENSE.
  This already works in Portage today via our default
  ACCEPT_LICENSE="@FREE" but pkgcheck doesn't diagnose it at all.

  * https://github.com/pkgcore/pkgcheck/issues/471 ("Add LICENSE
    visibility checks ("NonsolvableLicenseDeps")")
  * https://github.com/pkgcore/pkgcheck/issues/652 ("
    [New Check]: NonFreeLicense: Optional check for non-free packages")
  * https://github.com/pkgcore/pkgcheck/issues/651 ("[New Check]:
    PackageBecameNonFree (git check for commits transitioning a package
    from free -> non-free)")

  I think this is useful anyway for the example(s) I cited in there,
  like https://bugs.gentoo.org/832778.
  
  With that, we could know when a package introduces a dependency on a
  tainted version.

  Open question is what LICENSE marker we would use, and of course it
  would need those parts implemented in pkgcheck that I mention.

2) A 'features/slop' (alike features/wd40) profile

  We would have a profile where we mask tainted versions in package.mask
  in such a profile.

  The problem is, the tooling for this isn't great. We don't have Funtoo
  profile 'mixins'. wd40 works primarily for arches where the support
  doesn't exist for Rust at all and such profiles inherit that, rather
  than users making custom profiles that inherit it (though I know a few
  do this).

3) PROPERTIES, RESTRICT

  This has come up in the thread here and I think it could perhaps work
  in the same way as bindist does.

  I don't think we have visibility checks for this though, so same
  problem as LICENSE (1)).

4) metadata.xml

  Also proposed in this thread.
  
  This influencing dependency resolution would be unexpected, and I
  think any solution that doesn't affect dep resolution is going to be
  insufficient..

  .. because it makes it easy to introduce a dep on something you
  don't necessarily know is tainted, like maybe some new API in
  >=chardet-7, and users I think do want a way to opt out of these
  packages anyway.

(*) I consider these to hopefully have rough consensus even if
the app-admin/keepassxc case doesn't.

So, to me, it seems LICENSE is the best option even if not ideal, as it
gives us most of what we want, and it's the closest to working.

> [...]

thanks,
sam

Attachment: signature.asc
Description: PGP signature

Reply via email to