Re: [Idea] AUR vetting process using LLM models

DrasLorus Fri, 29 May 2026 07:18:41 -0700

Hi all,

FWIW, I too think that LLM usage (custom / homemade or subsidized to Google / Anthropic / MS / OpenAI) would be inefficient. I get why LLMs could feel interesting in this matter, but considering the way the current attacks work, a simple rule-based intrusion detection system may do the trick just fine, with a git-hook or something alike.

As previous answer pointed out, simply flagging and freezing for review PKGBUILDs with IOCs like added dependencies after adoption, or with addition of known attack vectors (i.e. NPM registry, download in the build and package functions, things alike) might already prevent a lot of those recent attacks. Those kinds of scans could be done by parsing the patch instead of the PKGBUILD itself, since the current attack are less like XZ long run infiltration and more like pip/NPM token hacking followed by coarse payload injection. An AV could also be run on new sources to block classic payload injection, for instance. This may also prevent (or make harder at least) supply chain attacks that may poison a previously trustworthy URL, but it will require an infrastructure, and it has a cost.

Considering the scale of the AUR, I fear an LLM parsing on PKGBUILDs is likely to exhaust the moderating team with a too-low detection threshold.


Regards,

DrasLorus

Le 29/05/2026 à 10:54, [email protected] a écrit :

Hi,
I'm opposed to this idea not because it's AI, but because it's misapplied and likely to be really costly with a non-deterministic result.
What you are proposing involves the AUR to start checking every single commit via a DevSecOps CI/CD SSDLC style pipeline, but only including one tool (homebrewed AI solution).
It would be better to set up proper security tooling, or even integrate 3rd party malware detection services.
Reinventing SAST and DAST scanners here via selfhosted or costly 3rd party LLM providers seems like a very wrong approach.
A solution here should focus on clustering and detection of compromised/malicious accounts for rapid takedowns or a moderation hold, prevention of malware being included by blacklisting known IOCs in packages, lastly proper automated scanning of existing packages for malicious activity.
AI and LLMs have a role in supporting those three tasks as portions of the tools, not as the solution itself.
You are welcome to scan the AUR currently and make "bot comments" on the maliciousness of packages, but I think any Arch hosted or developed solution should stay away from spinning up such expensive infrastructure to reinvent the wheel.
Regards,

Shyamin Ayesh:
Hello Everyone,

I know this is going to be a controversial idea, and I'm not much of a
writer, so bear with me here.

I've been noticing the recent wave of spam packages and malicious code
submissions hitting the AUR lately. It's getting worse, and the current
manual review process clearly doesn't scale.

So here's my possibly unpopular suggestion: *what if we used LLMs as a
first-pass filter for AUR submissions?*

*The basic idea:*
- When a PKGBUILD or install script gets submitted, an LLM scans it for sketchy stuff like obfuscated code, curl pipes to random endpoints, crypto
miners, encoded payloads, that kind of thing.
- It doesn't replace human review. It just flags the suspicious ones so
reviewers know where to look first.
- Unlike regex-based scanners, LLMs can actually understand code intent.
They can catch things like subtle dependency hijacking or weird
post-install behavior that static tools would miss.
- Flagged packages go into a queue with the LLM's reasoning attached, not
just "blocked" but why it thinks something is off.

I get it, there are real concerns. False positives, inference costs, and
honestly just the idea of putting AI anywhere near the trust pipeline. But I'm not saying replace anything. Just add a layer. Could be a server-side hook on submission, or a community bot that comments on new packages. I'm
happy to help build a prototype if anyone's interested.

I know some of you are going to hate this idea, and that's fine. But the
spam problem is real and getting worse, so I figured it's worth putting out
there. Open to better ideas too.

Cheers*,*
Shyamin

OpenPGP_0x9F75D93B3ED1993D.asc
Description: OpenPGP public key

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [Idea] AUR vetting process using LLM models

Reply via email to