justinmclean commented on code in PR #215:
URL: https://github.com/apache/airflow-steward/pull/215#discussion_r3293905481
##########
tools/skill-validator/src/skill_validator/__init__.py:
##########
@@ -570,6 +590,61 @@ def validate_principle_compliance(path: Path, text: str)
-> Iterable[Violation]:
)
+# ---------------------------------------------------------------------------
+# Privacy-LLM gate-check (write-skill/security-checklist.md § Pattern 6)
+# ---------------------------------------------------------------------------
+
+
+def validate_privacy_patterns(path: Path, text: str) -> Iterable[Violation]:
+ """Check Privacy-LLM gate-check convention from
``write-skill/security-checklist.md``.
+
+ **Pattern 6** *(SKILL.md only)*: skills whose ``mode`` implies processing
+ external / attacker-controlled content **and** that *read full issue
bodies*
+ from the private ``<tracker>`` repo must invoke the Privacy-LLM gate-check
+ (``privacy-llm-check``) before making any outbound LLM call.
+
+ Three conditions must all be true to trigger the check:
+
+ 1. The file is a ``SKILL.md`` entry point.
+ 2. The ``mode`` frontmatter field is one of the external-content modes
+ (``Triage``, ``Mentoring``, ``Drafting``).
+ 3. The skill both references ``<tracker>`` **and** contains ``gh issue
view``
+ — the command that fetches full issue bodies (embargoed CVE detail,
+ reporter PII, etc.). Skills that only write to / query metadata from
+ the tracker (create an issue, list milestones, search titles) are exempt
+ because they never pass private issue body content to the model.
+
+ All violations are **SOFT** — advisory, surfaced as warnings without
+ failing the run unless ``--strict`` is passed.
+ """
+ # Pattern 6 is only relevant for SKILL.md entry points.
+ if path.name != "SKILL.md":
+ return
+
+ fm = parse_frontmatter(text) or {}
+ mode = fm.get("mode", "")
+ if mode not in _EXTERNAL_CONTENT_MODES:
+ return
+
+ # Only flag skills that both reference the tracker AND read full issue
bodies.
+ if _TRACKER_PLACEHOLDER not in text:
+ return
+ if _TRACKER_READ_PHRASE not in text:
+ return
+
+ if _PRIVACY_LLM_GATE_PHRASE not in text:
Review Comment:
Thanks, agreed. This is fixed now.
The gate is no longer a plain `in text` substring check. The validator now
strips HTML comments and only treats `privacy-llm-check` as satisfying the gate
when it appears inside a fenced command block, so TODOs, prose mentions, HTML
comments, and inline-code anti-examples no longer bypass the check.
I also added regression coverage for:
- `<!-- TODO: wire up privacy-llm-check -->` still failing
- prose mentions still failing
- inline-code mentions still failing
- fenced command blocks passing
- indented fenced command blocks passing
While rebasing I also fixed the fenced-code detector to handle indented
fences, since the actual Step 0/preflight command blocks live inside
numbered-list indentation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]