andrewmusselman commented on issue #703: URL: https://github.com/apache/tooling-trusted-releases/issues/703#issuecomment-3992807078
After some prefix testing, a simple `# audit_guidance <comment text>` performs well, tried some other formatting and they applied more pressure to withhold findings. <html><head></head><body><h1>Audit Comparison: v2 vs v3 vs v4 — Comment Format as the Only Variable</h1> <h2>Setup</h2> <p>All three versions use the <strong>same system prompt</strong> with identical scoping instructions telling the LLM to apply <code>audit_guidance</code> only to the specific issue described, not the entire class or file. The only variable is the comment format in the source code itself:</p> <ul> <li><strong>v2</strong> — <code># audit_guidance this is an intentional use of x-shellscript without Content-Disposition</code></li> <li><strong>v3</strong> — <code># Audit guidance: this is an intentional use of x-shellscript without Content-Disposition</code></li> <li><strong>v4</strong> — <code># Audit guidance (Content-Disposition): this is an intentional use of x-shellscript without Content-Disposition</code></li> </ul> <hr> <h2>Side-by-Side Findings</h2> Finding | v2 (underscore) | v3 (colon) | v4 (colon + topic) -- | -- | -- | -- ShellResponse no Content-Disposition | Excluded | Excluded | Excluded ShellResponse no nosniff | Finding — Low | Not reported | Not reported ZipResponse no Content-Disposition | Finding — Medium | Finding — Medium | Finding — Medium All classes no nosniff | Not standalone | Not reported | Not reported No Sec-Fetch-* validation | Not reported | Not reported | Not reported Total findings | 2 | 1 | 1 <p>v4 proves the LLM reads and parses the <code>(Content-Disposition)</code> topic — it names it explicitly in its rationale. But understanding the scope and <em>acting on it</em> are different things. The formality of the comment overrides the scoping behavior the prompt is trying to enforce.</p> <hr> <h2>v4's Formatting Regression</h2> <p>v4 also outputs its finding as a JSON array instead of the markdown format specified in the prompt. v2 and v3 both used clean markdown. This may be another artifact of the more structured comment format cueing the LLM toward structured data output, though it could also be an isolated run variance.</p> <hr> <h2>Conclusion</h2> <p>The prompt's scoping instructions work — but only when the comment format is informal enough that the LLM treats it as a tag to be interpreted rather than a directive to be followed. The winning combination is:</p> <ul> <li><strong>Keep the informal <code># audit_guidance</code> format</strong> in source code comments — it needs to be recognizable as a tag, not authoritative as a directive.</li> <li><strong>Let the prompt do the scoping work</strong> — the prompt instructions about narrow scope, block-level application, and issue-specific suppression are what actually produce correct behavior.</li> <li><strong>Don't over-formalize the comment</strong> — capitalization, colons, and parenthetical topics all increase the comment's perceived authority, which competes with the prompt rather than complementing it.</li> </ul></body></html> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
