Michał Górny <[email protected]> writes: > Hello, > > Recently we've seen quite a grow in LLM usage in upstream packages, to > various degrees. The exact degree differs, apparently ranging from > using LLMs as glorified autocomplete to asking them to rewrite the whole > project or being convinced that a chatbot is sentient. The results are > raising some concerns. I've originally written to the distributions ml > [1] to facilitate some feedback on how other distributions are handling > it, but the major players didn't reply. > > Our "AI policy" [2] covers only direct contributions to Gentoo. At the > time, we did not consider it appropriate to start restricting what > packages are added to Gentoo. Still, general rules apply and some of > the recent packages are starting to raise concerns there. Hence I'm > looking for your feedback. > > Two recent cases that impacted the Python team were autobahn and > chardet.
I think there are cases where we can uncontroversially (*) want to mark somehow, those are: 1) LLM-assisted rewrites for copyright reasons: clearly dubious ethically and legally; 2) where the quality has gone substantially downhill, so legal issues aside, it's not fair to our developers or our users (or upstreams of *other* packages) to expose them to bugs from the rewrite. Supposing we can agree that at least some packages require marking, there's some things we want out of that: 1) users being able to avoid such packages if they wish; 2) preventing software from relying on it (at least in some, perhaps the default, configuration(s)); 3) avoiding people packaging tainted/broken versions. For technical ways of achieving this marking: 1) LICENSE I think we need a way of having proper visibility-affected-by-LICENSE. This already works in Portage today via our default ACCEPT_LICENSE="@FREE" but pkgcheck doesn't diagnose it at all. * https://github.com/pkgcore/pkgcheck/issues/471 ("Add LICENSE visibility checks ("NonsolvableLicenseDeps")") * https://github.com/pkgcore/pkgcheck/issues/652 (" [New Check]: NonFreeLicense: Optional check for non-free packages") * https://github.com/pkgcore/pkgcheck/issues/651 ("[New Check]: PackageBecameNonFree (git check for commits transitioning a package from free -> non-free)") I think this is useful anyway for the example(s) I cited in there, like https://bugs.gentoo.org/832778. With that, we could know when a package introduces a dependency on a tainted version. Open question is what LICENSE marker we would use, and of course it would need those parts implemented in pkgcheck that I mention. 2) A 'features/slop' (alike features/wd40) profile We would have a profile where we mask tainted versions in package.mask in such a profile. The problem is, the tooling for this isn't great. We don't have Funtoo profile 'mixins'. wd40 works primarily for arches where the support doesn't exist for Rust at all and such profiles inherit that, rather than users making custom profiles that inherit it (though I know a few do this). 3) PROPERTIES, RESTRICT This has come up in the thread here and I think it could perhaps work in the same way as bindist does. I don't think we have visibility checks for this though, so same problem as LICENSE (1)). 4) metadata.xml Also proposed in this thread. This influencing dependency resolution would be unexpected, and I think any solution that doesn't affect dep resolution is going to be insufficient.. .. because it makes it easy to introduce a dep on something you don't necessarily know is tainted, like maybe some new API in >=chardet-7, and users I think do want a way to opt out of these packages anyway. (*) I consider these to hopefully have rough consensus even if the app-admin/keepassxc case doesn't. So, to me, it seems LICENSE is the best option even if not ideal, as it gives us most of what we want, and it's the closest to working. > [...] thanks, sam
signature.asc
Description: PGP signature
