This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-steward.git


The following commit(s) were added to refs/heads/main by this push:
     new 87ee836  fix(skill-validator): preserve blank lines inside block 
scalars in parse_frontmatter (#238)
87ee836 is described below

commit 87ee836c08952e0e8e273fbc3e2aace57ba5f0cd
Author: AndrĂ© Ahlert <[email protected]>
AuthorDate: Wed May 20 05:19:03 2026 -0300

    fix(skill-validator): preserve blank lines inside block scalars in 
parse_frontmatter (#238)
    
    `parse_frontmatter` treated any blank line as a terminator for the
    current key, so a block scalar with a paragraph break (a blank line
    between two indented paragraphs in a `|` or `>` value) silently lost
    everything after the first paragraph.
    
    Two consequences:
    
    1. `validate_frontmatter` measures `len(fm["description"]) +
       len(fm["when_to_use"])` against `MAX_METADATA_CHARS` (the
       Claude Code truncation budget). With the dropped paragraphs the
       measurement was too small, so a frontmatter that actually exceeds
       the budget could pass the check.
    
    2. `validate_principle_compliance` and `validate_trigger_preservation`
       only saw the truncated string, missing forbidden patterns or
       trigger phrasing that lived in the dropped paragraphs.
    
    No existing SKILL.md hit the bug, so it was latent. The
    `init_skill.py` scaffolding emits multi-line description and
    when_to_use blocks, and any author writing a two-paragraph
    description would have been silently mis-validated.
    
    Fix: in real YAML, a blank line inside a block scalar is part of the
    value, not a terminator. Only a new top-level key finalises the
    current value. Append blank lines to the value lines and let
    `.strip()` at finalisation discard leading/trailing blanks so
    single-line values are unaffected.
    
    Adds a regression test that asserts a `|` block scalar with an
    internal blank line keeps both paragraphs.
---
 .../src/skill_validator/__init__.py                | 11 ++++++----
 tools/skill-validator/tests/test_validator.py      | 24 ++++++++++++++++++++++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/tools/skill-validator/src/skill_validator/__init__.py 
b/tools/skill-validator/src/skill_validator/__init__.py
index a93e034..3b16ffe 100644
--- a/tools/skill-validator/src/skill_validator/__init__.py
+++ b/tools/skill-validator/src/skill_validator/__init__.py
@@ -291,12 +291,15 @@ def parse_frontmatter(text: str) -> dict[str, str] | None:
         # Strip trailing whitespace but keep leading (for folded scalars)
         line = raw_line.rstrip()
 
-        # Empty line ends a folded scalar
+        # Blank line: in real YAML, a blank line inside a block scalar
+        # is part of the value, not a terminator. Only a new top-level
+        # key finalises the current value. Preserve the blank so
+        # multi-paragraph descriptions are measured and validated in
+        # full; a trailing/leading blank is removed by `.strip()` at
+        # finalisation, so single-line values are unaffected.
         if line == "":
             if current_key is not None:
-                result[current_key] = "\n".join(current_value_lines).strip()
-                current_key = None
-                current_value_lines = []
+                current_value_lines.append("")
             continue
 
         # New top-level key?
diff --git a/tools/skill-validator/tests/test_validator.py 
b/tools/skill-validator/tests/test_validator.py
index 8189ce8..baa572a 100644
--- a/tools/skill-validator/tests/test_validator.py
+++ b/tools/skill-validator/tests/test_validator.py
@@ -78,6 +78,30 @@ class TestParseFrontmatter:
         assert "First line" in fm["description"]
         assert "Second line" in fm["description"]
 
+    def test_block_scalar_preserves_internal_blank_line(self) -> None:
+        """Blank lines inside a ``|`` block scalar are part of the value.
+
+        Regression: the parser used to treat any blank line as a
+        terminator, silently dropping everything after the first
+        paragraph break. That made ``MAX_METADATA_CHARS`` measurement
+        and principle/trigger validation operate on truncated text.
+        """
+        text = (
+            "---\n"
+            "name: my-skill\n"
+            "description: |\n"
+            "  Paragraph one.\n"
+            "\n"
+            "  Paragraph two, which used to be dropped.\n"
+            "license: Apache-2.0\n"
+            "---\n"
+        )
+        fm = parse_frontmatter(text)
+        assert fm is not None
+        assert "Paragraph one." in fm["description"]
+        assert "Paragraph two" in fm["description"]
+        assert fm["license"] == "Apache-2.0"
+
     def test_missing_frontmatter(self) -> None:
         assert parse_frontmatter("# no frontmatter\n") is None
 

Reply via email to