[nexa] Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Giacomo Tesio Wed, 20 Aug 2025 01:55:18 -0700

Large Language Models (LLMs) and their agentic frameworks are
increasingly adopted to automate software development tasks such as
issue resolution and program repair. While prior work has identified
security risks in LLM-generated code, most evaluations have focused on
synthetic or isolated settings, leaving open questions about the
security of these systems in real-world development contexts.


In this study, we present the first large-scale security analysis of
LLM-generated patches using 20,000+ issues from the SWE-bench dataset.
We evaluate patches produced by a standalone LLM (Llama 3.3) and
compare them to developer-written patches. We also assess the security
of patches generated by three top-performing agentic frameworks
(OpenHands, AutoCodeRover, HoneyComb) on a subset of our data. 
Finally, we analyze a wide range of code, issue, and project-level
factors to understand the conditions under which LLMs and agents are
most likely to generate insecure code.

Our findings reveal that the standalone LLM introduces nearly 9x more
new vulnerabilities than developers, with many of these exhibiting
unique patterns not found in developers'code. Agentic workflows also
generate a significant number of vulnerabilities, particularly when
granting LLMs more autonomy, potentially increasing the likelihood of
misinterpreting project context or task requirements. We find that
vulnerabilities are more likely to occur in LLM patches associated with
a higher number of files, more lines of generated code, and GitHub
issues that lack specific code snippets or information about the
expected code behavior and steps to reproduce. These results suggest
that contextual factors play a critical role in the security of the
generated code and point toward the need for proactive risk assessment
methods that account for both code and issue-level information to
complement existing vulnerability detection tools.

https://arxiv.org/pdf/2507.02976


Gli LLM introducono fino a 9 volte più vulnerabilità di sicurezza 
di un programmatore.

Quando sentite un imprenditore o un manager sostenere che la
AI aumenta la produttività degli sviluppatori, potete dedurre
con certezza che almeno una delle seguenti affermazioni è vera:

- hanno enormi problemi nel processo di selezione del personale e si
  ritrovano con sviluppatori straordinariamente incompetenti (e dunque
  i loro prodotti sono inutili o pericolosi)

- vendono LLM o software che vi si basano

- stanno ripetendo a pappagallo fuffa letta online per sembrare cool

- stanno mentendo


Giacomo

[nexa] Are AI-Generated Fixes Secure? Analyzing LLM and Agent Patches on SWE-bench

Reply via email to