andreahlert commented on code in PR #215:
URL: https://github.com/apache/airflow-steward/pull/215#discussion_r3295135519


##########
tools/skill-validator/src/skill_validator/__init__.py:
##########
@@ -570,6 +590,61 @@ def validate_principle_compliance(path: Path, text: str) 
-> Iterable[Violation]:
         )
 
 
+# ---------------------------------------------------------------------------
+# Privacy-LLM gate-check (write-skill/security-checklist.md § Pattern 6)
+# ---------------------------------------------------------------------------
+
+
+def validate_privacy_patterns(path: Path, text: str) -> Iterable[Violation]:
+    """Check Privacy-LLM gate-check convention from 
``write-skill/security-checklist.md``.
+
+    **Pattern 6** *(SKILL.md only)*: skills whose ``mode`` implies processing
+    external / attacker-controlled content **and** that *read full issue 
bodies*
+    from the private ``<tracker>`` repo must invoke the Privacy-LLM gate-check
+    (``privacy-llm-check``) before making any outbound LLM call.
+
+    Three conditions must all be true to trigger the check:
+
+    1. The file is a ``SKILL.md`` entry point.
+    2. The ``mode`` frontmatter field is one of the external-content modes
+       (``Triage``, ``Mentoring``, ``Drafting``).
+    3. The skill both references ``<tracker>`` **and** contains ``gh issue 
view``
+       — the command that fetches full issue bodies (embargoed CVE detail,
+       reporter PII, etc.).  Skills that only write to / query metadata from
+       the tracker (create an issue, list milestones, search titles) are exempt
+       because they never pass private issue body content to the model.
+
+    All violations are **SOFT** — advisory, surfaced as warnings without
+    failing the run unless ``--strict`` is passed.
+    """
+    # Pattern 6 is only relevant for SKILL.md entry points.
+    if path.name != "SKILL.md":
+        return
+
+    fm = parse_frontmatter(text) or {}
+    mode = fm.get("mode", "")
+    if mode not in _EXTERNAL_CONTENT_MODES:
+        return
+
+    # Only flag skills that both reference the tracker AND read full issue 
bodies.
+    if _TRACKER_PLACEHOLDER not in text:
+        return
+    if _TRACKER_READ_PHRASE not in text:
+        return
+
+    if _PRIVACY_LLM_GATE_PHRASE not in text:

Review Comment:
   Thanks, the HTML-comment, prose, and inline-code cases are properly closed, 
and the regression tests match. Two gaps from the original review still stand 
though:
   
   **1. Fenced anti-example blocks still satisfy the gate.** 
`_has_privacy_gate_command` iterates every fenced block in the body with no 
section context, so a "Don't do this" / "Bad example" section containing:
   
   ````markdown
   ## Don't do this
   
   ```bash
   uv run --project <framework>/tools/privacy-llm/checker privacy-llm-check
   ```
   ````
   
   passes silently. The docstring on `validate_privacy_patterns` actually 
claims "anti-examples do not satisfy the check", but that is only true for 
anti-examples written as prose, not for fenced ones, which is the more 
dangerous shape.
   
   **2. The Prerequisites / Step 0 location is not enforced.** The error 
message tells authors to put the command "in the Prerequisites / Step 0 
section", but the validator accepts the phrase in any fenced block anywhere in 
the file (e.g. an appendix, a changelog snippet, a `## History` section). The 
original suggestion was to reuse the structural section locator you already 
have in other validators.
   
   Minimal closure for both: scope `_fenced_code_blocks` to blocks whose 
enclosing heading matches a Prerequisites / Preflight / Step 0 pattern, and/or 
exclude blocks under headings matching `(?i)don'?t|anti.?example|bad|wrong`. 
The first option subsumes the second and aligns the behavior with the error 
message.
   
   Minor: `_FENCED_CODE_RE = r"^[ \t]{0,3}```..."` accepts tabs in fence 
indentation; CommonMark only allows 0-3 spaces. Harmless in practice but worth 
tightening to `^ {0,3}```` for parity with the spec.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to