This is an automated email from the ASF dual-hosted git repository.
sbp pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tooling-trusted-releases.git
The following commit(s) were added to refs/heads/main by this push:
new 673620f Add an internal link checker for the documentation, and fix
some links
673620f is described below
commit 673620fefc687e1facb4867696d7e6a9ae4cf5a5
Author: Sean B. Palmer <[email protected]>
AuthorDate: Fri Oct 10 18:44:53 2025 +0100
Add an internal link checker for the documentation, and fix some links
---
Makefile | 4 +-
atr/docs/developer-guide.html | 2 +-
atr/docs/developer-guide.md | 2 +-
atr/docs/introduction-to-atr.html | 2 +-
atr/docs/introduction-to-atr.md | 2 +-
scripts/{build_docs.py => docs_build.py} | 0
scripts/docs_check.py | 159 +++++++++++++++++++++++++++++++
scripts/docs_post_process.py | 16 ++--
8 files changed, 176 insertions(+), 11 deletions(-)
diff --git a/Makefile b/Makefile
index 69df453..b87dfbe 100644
--- a/Makefile
+++ b/Makefile
@@ -48,13 +48,15 @@ commit:
git push
docs:
+ uv run python3 scripts/docs_check.py
rm -f atr/docs/*.html docs/*.html
- uv run python3 scripts/build_docs.py
+ uv run python3 scripts/docs_build.py
for fn in atr/docs/*.md docs/*.md; \
do \
cmark "$$fn" > "$${fn%.md}.html"; \
done
uv run python3 scripts/docs_post_process.py atr/docs/*.html docs/*.html
+ uv run python3 scripts/docs_check.py
generate-version:
@rm -f atr/version.py
diff --git a/atr/docs/developer-guide.html b/atr/docs/developer-guide.html
index 5a0d296..d046b56 100644
--- a/atr/docs/developer-guide.html
+++ b/atr/docs/developer-guide.html
@@ -16,4 +16,4 @@
<li><a href="#introduction">Introduction</a></li>
</ul>
<h2 id="introduction">Introduction</h2>
-<p>This is a guide for developers of ATR, explaining how to make changes to
the ATR source code. For more information about how to contribute those changes
back to us, please read the <a href="contribution-guide">contribution guide</a>
instead.</p>
+<p>This is a guide for developers of ATR, explaining how to make changes to
the ATR source code. For more information about how to contribute those changes
back to us, please read the <a href="how-to-contribute">contribution guide</a>
instead.</p>
diff --git a/atr/docs/developer-guide.md b/atr/docs/developer-guide.md
index 3b565eb..7ba4881 100644
--- a/atr/docs/developer-guide.md
+++ b/atr/docs/developer-guide.md
@@ -21,4 +21,4 @@
## Introduction
-This is a guide for developers of ATR, explaining how to make changes to the
ATR source code. For more information about how to contribute those changes
back to us, please read the [contribution guide](contribution-guide) instead.
+This is a guide for developers of ATR, explaining how to make changes to the
ATR source code. For more information about how to contribute those changes
back to us, please read the [contribution guide](how-to-contribute) instead.
diff --git a/atr/docs/introduction-to-atr.html
b/atr/docs/introduction-to-atr.html
index 82df722..0be0411 100644
--- a/atr/docs/introduction-to-atr.html
+++ b/atr/docs/introduction-to-atr.html
@@ -24,4 +24,4 @@
<p>Speaking of steps, what are the steps to release software on ATR? We have
kept this as simple as possible. First, the project's participants compose a
candidate release from existing files. Second, as per ASF policy, the PMC votes
on that candidate release. Third, if the vote passes, the PMC officially
publishes and announces the erstwhile candidate release as a finished, official
release. That's the whole process for the majority of PMCs, but of course there
are many details and cons [...]
<h2 id="who-develops-atr">Who develops ATR?</h2>
<p>ATR is developed by ASF Tooling, an ASF initiative launched in 2025, and
responsible for streamlining development, automating repetitive tasks, reducing
technical debt, and enhancing collaboration throughout the ASF. The source code
of ATR is developed in public as open source code, and ASF Tooling welcomes
high quality contributions to the codebase from external contributors, whether
from existing ASF contributors or members of the public. Because of the
stringent security and usabil [...]
-<p>This manual is an integral part of ATR, and contributions to this manual
are therefore treated like any of the rest of the code. We welcome all types of
contribution, whether that be writing entire pages or correcting small
typographical errors. The easiest path to contribution is to <a
href="https://github.com/apache/tooling-trusted-release/compare">create a pull
request</a> on <a href="https://github.com/apache/tooling-trusted-release">our
GitHub repository</a>. You can also <a href [...]
+<p>This manual is an integral part of ATR, and contributions to this manual
are therefore treated like any of the rest of the code. We welcome all types of
contribution, whether that be writing entire pages or correcting small
typographical errors. The easiest path to contribution is to <a
href="https://github.com/apache/tooling-trusted-release/compare">create a pull
request</a> on <a href="https://github.com/apache/tooling-trusted-release">our
GitHub repository</a>. You can also <a href [...]
diff --git a/atr/docs/introduction-to-atr.md b/atr/docs/introduction-to-atr.md
index acc5b20..3e9440e 100644
--- a/atr/docs/introduction-to-atr.md
+++ b/atr/docs/introduction-to-atr.md
@@ -43,4 +43,4 @@ Speaking of steps, what are the steps to release software on
ATR? We have kept t
ATR is developed by ASF Tooling, an ASF initiative launched in 2025, and
responsible for streamlining development, automating repetitive tasks, reducing
technical debt, and enhancing collaboration throughout the ASF. The source code
of ATR is developed in public as open source code, and ASF Tooling welcomes
high quality contributions to the codebase from external contributors, whether
from existing ASF contributors or members of the public. Because of the
stringent security and usability [...]
-This manual is an integral part of ATR, and contributions to this manual are
therefore treated like any of the rest of the code. We welcome all types of
contribution, whether that be writing entire pages or correcting small
typographical errors. The easiest path to contribution is to [create a pull
request](https://github.com/apache/tooling-trusted-release/compare) on [our
GitHub repository](https://github.com/apache/tooling-trusted-release). You can
also [email patches](https://lists.ap [...]
+This manual is an integral part of ATR, and contributions to this manual are
therefore treated like any of the rest of the code. We welcome all types of
contribution, whether that be writing entire pages or correcting small
typographical errors. The easiest path to contribution is to [create a pull
request](https://github.com/apache/tooling-trusted-release/compare) on [our
GitHub repository](https://github.com/apache/tooling-trusted-release). You can
also [email patches](https://lists.ap [...]
diff --git a/scripts/build_docs.py b/scripts/docs_build.py
similarity index 100%
rename from scripts/build_docs.py
rename to scripts/docs_build.py
diff --git a/scripts/docs_check.py b/scripts/docs_check.py
new file mode 100755
index 0000000..ba36ff1
--- /dev/null
+++ b/scripts/docs_check.py
@@ -0,0 +1,159 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pathlib
+import re
+import sys
+from typing import Final, NamedTuple
+
+sys.path.insert(0, str(pathlib.Path(__file__).parent))
+import docs_post_process as post_process
+
+
+class Link(NamedTuple):
+ source_file: str
+ line_number: int
+ text: str
+ target: str
+ anchor: str | None
+
+
+class Heading(NamedTuple):
+ text: str
+ anchor: str
+
+
+# TODO: Should think more about whether scripts should use the _ convention or
not
+# The rationale for using it is that then we can port to non-script code more
easily
+# But for scripts *per se*, it does not make sense
+
+
+_LINK_PATTERN: Final = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
+_HEADING_PATTERN: Final = re.compile(r"^#+\s+(.+)$")
+
+
+def _extract_links(file_path: pathlib.Path) -> list[Link]:
+ content = file_path.read_text(encoding="utf-8")
+ lines = content.splitlines()
+ links = []
+
+ for line_number, line in enumerate(lines, start=1):
+ for match in _LINK_PATTERN.finditer(line):
+ text = match.group(1)
+ target = match.group(2)
+
+ if target.startswith("/ref/"):
+ continue
+
+ if target.startswith("http://") or target.startswith("https://"):
+ continue
+
+ anchor = None
+ if "#" in target:
+ target, anchor = target.split("#", 1)
+
+ links.append(Link(file_path.name, line_number, text, target,
anchor))
+
+ return links
+
+
+def _extract_headings(file_path: pathlib.Path) -> list[Heading]:
+ content = file_path.read_text(encoding="utf-8")
+ lines = content.splitlines()
+ headings = []
+
+ for line in lines:
+ match = _HEADING_PATTERN.match(line)
+ if match:
+ text = match.group(1)
+ anchor = post_process.generate_heading_id(text)
+ headings.append(Heading(text, anchor))
+
+ return headings
+
+
+def _validate_links(docs_dir: pathlib.Path, all_links: list[Link]) ->
list[str]:
+ errors = []
+ existing_files = {f.stem for f in docs_dir.glob("*.md")}
+ heading_cache: dict[str, set[str]] = {}
+
+ for link in all_links:
+ if link.target == ".":
+ target_file = "index"
+ elif link.target:
+ if link.target.endswith(".html"):
+ errors.append(
+ f"{link.source_file}:{link.line_number}: Link should not
include '.html' extension: '{link.target}'"
+ )
+ target_file = link.target.removesuffix(".html")
+ else:
+ target_file = link.target
+ else:
+ target_file = link.source_file.replace(".md", "")
+
+ if target_file not in existing_files:
+ errors.append(
+ f"{link.source_file}:{link.line_number}: "
+ f"Link to non-existent file '{link.target}' "
+ f"(expected {target_file}.md)"
+ )
+ continue
+
+ if link.anchor:
+ if target_file not in heading_cache:
+ target_path = docs_dir / f"{target_file}.md"
+ headings = _extract_headings(target_path)
+ heading_cache[target_file] = {h.anchor for h in headings}
+
+ if link.anchor not in heading_cache[target_file]:
+ errors.append(
+ f"{link.source_file}:{link.line_number}: "
+ f"Link to non-existent anchor '#{link.anchor}' in
'{target_file}'"
+ )
+
+ return errors
+
+
+def main() -> None:
+ docs_dir = pathlib.Path("atr/docs")
+
+ if not docs_dir.exists():
+ print(f"Error: {docs_dir} not found", file=sys.stderr)
+ sys.exit(1)
+
+ all_links = []
+ for md_file in docs_dir.glob("*.md"):
+ links = _extract_links(md_file)
+ all_links.extend(links)
+
+ errors = _validate_links(docs_dir, all_links)
+
+ if errors:
+ print("Documentation link validation errors:\n", file=sys.stderr)
+ for error in errors:
+ print(error, file=sys.stderr)
+ print(f"\nFound {len(errors)} error(s)", file=sys.stderr)
+ sys.exit(1)
+
+ print(f"Validated {len(all_links)} links across
{len(list(docs_dir.glob('*.md')))} files")
+ print("All links are valid")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/docs_post_process.py b/scripts/docs_post_process.py
index 9211c62..caa0859 100644
--- a/scripts/docs_post_process.py
+++ b/scripts/docs_post_process.py
@@ -22,6 +22,15 @@ import re
import sys
+def generate_heading_id(text: str) -> str:
+ text = re.sub(r"^\d+\.\s*", "", text)
+ text = text.lower()
+ text = re.sub(r"[^\w\s-]", "", text)
+ text = re.sub(r"[\s_]+", "-", text)
+ text = text.strip("-")
+ return text
+
+
class HeadingProcessor(parser.HTMLParser):
def __init__(self) -> None:
super().__init__()
@@ -68,12 +77,7 @@ class HeadingProcessor(parser.HTMLParser):
self.output.append(text)
def _generate_id(self, text: str) -> str:
- text = re.sub(r"^\d+\.\s*", "", text)
- text = text.lower()
- text = re.sub(r"[^\w\s-]", "", text)
- text = re.sub(r"[\s_]+", "-", text)
- text = text.strip("-")
- return text
+ return generate_heading_id(text)
def get_html(self) -> str:
return "".join(self.output)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]