This is an automated email from the ASF dual-hosted git repository.

sbp pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tooling-trusted-releases.git


The following commit(s) were added to refs/heads/main by this push:
     new 673620f  Add an internal link checker for the documentation, and fix 
some links
673620f is described below

commit 673620fefc687e1facb4867696d7e6a9ae4cf5a5
Author: Sean B. Palmer <[email protected]>
AuthorDate: Fri Oct 10 18:44:53 2025 +0100

    Add an internal link checker for the documentation, and fix some links
---
 Makefile                                 |   4 +-
 atr/docs/developer-guide.html            |   2 +-
 atr/docs/developer-guide.md              |   2 +-
 atr/docs/introduction-to-atr.html        |   2 +-
 atr/docs/introduction-to-atr.md          |   2 +-
 scripts/{build_docs.py => docs_build.py} |   0
 scripts/docs_check.py                    | 159 +++++++++++++++++++++++++++++++
 scripts/docs_post_process.py             |  16 ++--
 8 files changed, 176 insertions(+), 11 deletions(-)

diff --git a/Makefile b/Makefile
index 69df453..b87dfbe 100644
--- a/Makefile
+++ b/Makefile
@@ -48,13 +48,15 @@ commit:
        git push
 
 docs:
+       uv run python3 scripts/docs_check.py
        rm -f atr/docs/*.html docs/*.html
-       uv run python3 scripts/build_docs.py
+       uv run python3 scripts/docs_build.py
        for fn in atr/docs/*.md docs/*.md; \
        do \
          cmark "$$fn" > "$${fn%.md}.html"; \
        done
        uv run python3 scripts/docs_post_process.py atr/docs/*.html docs/*.html
+       uv run python3 scripts/docs_check.py
 
 generate-version:
        @rm -f atr/version.py
diff --git a/atr/docs/developer-guide.html b/atr/docs/developer-guide.html
index 5a0d296..d046b56 100644
--- a/atr/docs/developer-guide.html
+++ b/atr/docs/developer-guide.html
@@ -16,4 +16,4 @@
 <li><a href="#introduction">Introduction</a></li>
 </ul>
 <h2 id="introduction">Introduction</h2>
-<p>This is a guide for developers of ATR, explaining how to make changes to 
the ATR source code. For more information about how to contribute those changes 
back to us, please read the <a href="contribution-guide">contribution guide</a> 
instead.</p>
+<p>This is a guide for developers of ATR, explaining how to make changes to 
the ATR source code. For more information about how to contribute those changes 
back to us, please read the <a href="how-to-contribute">contribution guide</a> 
instead.</p>
diff --git a/atr/docs/developer-guide.md b/atr/docs/developer-guide.md
index 3b565eb..7ba4881 100644
--- a/atr/docs/developer-guide.md
+++ b/atr/docs/developer-guide.md
@@ -21,4 +21,4 @@
 
 ## Introduction
 
-This is a guide for developers of ATR, explaining how to make changes to the 
ATR source code. For more information about how to contribute those changes 
back to us, please read the [contribution guide](contribution-guide) instead.
+This is a guide for developers of ATR, explaining how to make changes to the 
ATR source code. For more information about how to contribute those changes 
back to us, please read the [contribution guide](how-to-contribute) instead.
diff --git a/atr/docs/introduction-to-atr.html 
b/atr/docs/introduction-to-atr.html
index 82df722..0be0411 100644
--- a/atr/docs/introduction-to-atr.html
+++ b/atr/docs/introduction-to-atr.html
@@ -24,4 +24,4 @@
 <p>Speaking of steps, what are the steps to release software on ATR? We have 
kept this as simple as possible. First, the project's participants compose a 
candidate release from existing files. Second, as per ASF policy, the PMC votes 
on that candidate release. Third, if the vote passes, the PMC officially 
publishes and announces the erstwhile candidate release as a finished, official 
release. That's the whole process for the majority of PMCs, but of course there 
are many details and cons [...]
 <h2 id="who-develops-atr">Who develops ATR?</h2>
 <p>ATR is developed by ASF Tooling, an ASF initiative launched in 2025, and 
responsible for streamlining development, automating repetitive tasks, reducing 
technical debt, and enhancing collaboration throughout the ASF. The source code 
of ATR is developed in public as open source code, and ASF Tooling welcomes 
high quality contributions to the codebase from external contributors, whether 
from existing ASF contributors or members of the public. Because of the 
stringent security and usabil [...]
-<p>This manual is an integral part of ATR, and contributions to this manual 
are therefore treated like any of the rest of the code. We welcome all types of 
contribution, whether that be writing entire pages or correcting small 
typographical errors. The easiest path to contribution is to <a 
href="https://github.com/apache/tooling-trusted-release/compare";>create a pull 
request</a> on <a href="https://github.com/apache/tooling-trusted-release";>our 
GitHub repository</a>. You can also <a href [...]
+<p>This manual is an integral part of ATR, and contributions to this manual 
are therefore treated like any of the rest of the code. We welcome all types of 
contribution, whether that be writing entire pages or correcting small 
typographical errors. The easiest path to contribution is to <a 
href="https://github.com/apache/tooling-trusted-release/compare";>create a pull 
request</a> on <a href="https://github.com/apache/tooling-trusted-release";>our 
GitHub repository</a>. You can also <a href [...]
diff --git a/atr/docs/introduction-to-atr.md b/atr/docs/introduction-to-atr.md
index acc5b20..3e9440e 100644
--- a/atr/docs/introduction-to-atr.md
+++ b/atr/docs/introduction-to-atr.md
@@ -43,4 +43,4 @@ Speaking of steps, what are the steps to release software on 
ATR? We have kept t
 
 ATR is developed by ASF Tooling, an ASF initiative launched in 2025, and 
responsible for streamlining development, automating repetitive tasks, reducing 
technical debt, and enhancing collaboration throughout the ASF. The source code 
of ATR is developed in public as open source code, and ASF Tooling welcomes 
high quality contributions to the codebase from external contributors, whether 
from existing ASF contributors or members of the public. Because of the 
stringent security and usability [...]
 
-This manual is an integral part of ATR, and contributions to this manual are 
therefore treated like any of the rest of the code. We welcome all types of 
contribution, whether that be writing entire pages or correcting small 
typographical errors. The easiest path to contribution is to [create a pull 
request](https://github.com/apache/tooling-trusted-release/compare) on [our 
GitHub repository](https://github.com/apache/tooling-trusted-release). You can 
also [email patches](https://lists.ap [...]
+This manual is an integral part of ATR, and contributions to this manual are 
therefore treated like any of the rest of the code. We welcome all types of 
contribution, whether that be writing entire pages or correcting small 
typographical errors. The easiest path to contribution is to [create a pull 
request](https://github.com/apache/tooling-trusted-release/compare) on [our 
GitHub repository](https://github.com/apache/tooling-trusted-release). You can 
also [email patches](https://lists.ap [...]
diff --git a/scripts/build_docs.py b/scripts/docs_build.py
similarity index 100%
rename from scripts/build_docs.py
rename to scripts/docs_build.py
diff --git a/scripts/docs_check.py b/scripts/docs_check.py
new file mode 100755
index 0000000..ba36ff1
--- /dev/null
+++ b/scripts/docs_check.py
@@ -0,0 +1,159 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pathlib
+import re
+import sys
+from typing import Final, NamedTuple
+
+sys.path.insert(0, str(pathlib.Path(__file__).parent))
+import docs_post_process as post_process
+
+
+class Link(NamedTuple):
+    source_file: str
+    line_number: int
+    text: str
+    target: str
+    anchor: str | None
+
+
+class Heading(NamedTuple):
+    text: str
+    anchor: str
+
+
+# TODO: Should think more about whether scripts should use the _ convention or 
not
+# The rationale for using it is that then we can port to non-script code more 
easily
+# But for scripts *per se*, it does not make sense
+
+
+_LINK_PATTERN: Final = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
+_HEADING_PATTERN: Final = re.compile(r"^#+\s+(.+)$")
+
+
+def _extract_links(file_path: pathlib.Path) -> list[Link]:
+    content = file_path.read_text(encoding="utf-8")
+    lines = content.splitlines()
+    links = []
+
+    for line_number, line in enumerate(lines, start=1):
+        for match in _LINK_PATTERN.finditer(line):
+            text = match.group(1)
+            target = match.group(2)
+
+            if target.startswith("/ref/"):
+                continue
+
+            if target.startswith("http://";) or target.startswith("https://";):
+                continue
+
+            anchor = None
+            if "#" in target:
+                target, anchor = target.split("#", 1)
+
+            links.append(Link(file_path.name, line_number, text, target, 
anchor))
+
+    return links
+
+
+def _extract_headings(file_path: pathlib.Path) -> list[Heading]:
+    content = file_path.read_text(encoding="utf-8")
+    lines = content.splitlines()
+    headings = []
+
+    for line in lines:
+        match = _HEADING_PATTERN.match(line)
+        if match:
+            text = match.group(1)
+            anchor = post_process.generate_heading_id(text)
+            headings.append(Heading(text, anchor))
+
+    return headings
+
+
+def _validate_links(docs_dir: pathlib.Path, all_links: list[Link]) -> 
list[str]:
+    errors = []
+    existing_files = {f.stem for f in docs_dir.glob("*.md")}
+    heading_cache: dict[str, set[str]] = {}
+
+    for link in all_links:
+        if link.target == ".":
+            target_file = "index"
+        elif link.target:
+            if link.target.endswith(".html"):
+                errors.append(
+                    f"{link.source_file}:{link.line_number}: Link should not 
include '.html' extension: '{link.target}'"
+                )
+                target_file = link.target.removesuffix(".html")
+            else:
+                target_file = link.target
+        else:
+            target_file = link.source_file.replace(".md", "")
+
+        if target_file not in existing_files:
+            errors.append(
+                f"{link.source_file}:{link.line_number}: "
+                f"Link to non-existent file '{link.target}' "
+                f"(expected {target_file}.md)"
+            )
+            continue
+
+        if link.anchor:
+            if target_file not in heading_cache:
+                target_path = docs_dir / f"{target_file}.md"
+                headings = _extract_headings(target_path)
+                heading_cache[target_file] = {h.anchor for h in headings}
+
+            if link.anchor not in heading_cache[target_file]:
+                errors.append(
+                    f"{link.source_file}:{link.line_number}: "
+                    f"Link to non-existent anchor '#{link.anchor}' in 
'{target_file}'"
+                )
+
+    return errors
+
+
+def main() -> None:
+    docs_dir = pathlib.Path("atr/docs")
+
+    if not docs_dir.exists():
+        print(f"Error: {docs_dir} not found", file=sys.stderr)
+        sys.exit(1)
+
+    all_links = []
+    for md_file in docs_dir.glob("*.md"):
+        links = _extract_links(md_file)
+        all_links.extend(links)
+
+    errors = _validate_links(docs_dir, all_links)
+
+    if errors:
+        print("Documentation link validation errors:\n", file=sys.stderr)
+        for error in errors:
+            print(error, file=sys.stderr)
+        print(f"\nFound {len(errors)} error(s)", file=sys.stderr)
+        sys.exit(1)
+
+    print(f"Validated {len(all_links)} links across 
{len(list(docs_dir.glob('*.md')))} files")
+    print("All links are valid")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/docs_post_process.py b/scripts/docs_post_process.py
index 9211c62..caa0859 100644
--- a/scripts/docs_post_process.py
+++ b/scripts/docs_post_process.py
@@ -22,6 +22,15 @@ import re
 import sys
 
 
+def generate_heading_id(text: str) -> str:
+    text = re.sub(r"^\d+\.\s*", "", text)
+    text = text.lower()
+    text = re.sub(r"[^\w\s-]", "", text)
+    text = re.sub(r"[\s_]+", "-", text)
+    text = text.strip("-")
+    return text
+
+
 class HeadingProcessor(parser.HTMLParser):
     def __init__(self) -> None:
         super().__init__()
@@ -68,12 +77,7 @@ class HeadingProcessor(parser.HTMLParser):
             self.output.append(text)
 
     def _generate_id(self, text: str) -> str:
-        text = re.sub(r"^\d+\.\s*", "", text)
-        text = text.lower()
-        text = re.sub(r"[^\w\s-]", "", text)
-        text = re.sub(r"[\s_]+", "-", text)
-        text = text.strip("-")
-        return text
+        return generate_heading_id(text)
 
     def get_html(self) -> str:
         return "".join(self.output)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to