This is an automated email from the ASF dual-hosted git repository.
sbp pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tooling-trusted-releases.git
The following commit(s) were added to refs/heads/main by this push:
new 8b794f4 Add a documentation link checker and fix some documentation
bugs
8b794f4 is described below
commit 8b794f47754d4cf0678255b5cfac0194466c61a5
Author: Sean B. Palmer <[email protected]>
AuthorDate: Sun Nov 16 15:14:16 2025 +0000
Add a documentation link checker and fix some documentation bugs
---
atr/docs/build-processes.html | 8 +++----
atr/docs/code-conventions.md | 2 +-
atr/docs/overview-of-the-code.html | 2 +-
atr/docs/overview-of-the-code.md | 2 +-
atr/docs/storage-interface.html | 2 +-
atr/docs/storage-interface.md | 2 +-
scripts/docs_check.py | 44 ++++++++++++++++++++++++++++++++++++++
7 files changed, 53 insertions(+), 9 deletions(-)
diff --git a/atr/docs/build-processes.html b/atr/docs/build-processes.html
index 6ef5075..4dec9fc 100644
--- a/atr/docs/build-processes.html
+++ b/atr/docs/build-processes.html
@@ -8,8 +8,8 @@
</ul>
<h2 id="documentation-build-script">Documentation build script</h2>
<p>To <strong>regenerate the documentation</strong>, run <code>make
docs</code>.</p>
-<p>The ATR documentation that you're reading right now is structured like a
book, with numbered chapters, sections, and navigation links between pages. We
could maintain all of this by hand, but that would be tedious and error-prone.
Instead, we use <a
href="/ref/scripts/build_docs.py"><code>scripts/build_docs.py</code></a> to
generate the navigation automatically from a single table of contents.</p>
+<p>The ATR documentation that you're reading right now is structured like a
book, with numbered chapters, sections, and navigation links between pages. We
could maintain all of this by hand, but that would be tedious and error-prone.
Instead, we use <a
href="/ref/scripts/docs_build.py"><code>scripts/docs_build.py</code></a> to
generate the navigation automatically from a single table of contents.</p>
<p>The script reads the table of contents in <a
href="/ref/atr/docs/index.md"><code>atr/docs/index.md</code></a>, extracts the
hierarchy of pages, and then updates every referenced page to include the
correct navigation links, page numbers, and section listings. This means that
when we want to reorganize the documentation (say, inserting a new chapter or
moving sections around) we only need to edit the table of contents, run the
script, and all the navigation is updated automatically.</p>
-<p>The implementation is straightforward. The <a
href="/ref/scripts/build_docs.py:parse_toc"><code>parse_toc</code></a> function
extracts entries from the table of contents section in the index, and <a
href="/ref/scripts/build_docs.py:build_navigation"><code>build_navigation</code></a>
computes the up, previous, and next relationships for each page. The <a
href="/ref/scripts/build_docs.py:update_document"><code>update_document</code></a>
function is then called for each page, which rewri [...]
-<p>The navigation block itself is generated by <a
href="/ref/scripts/build_docs.py:generate_navigation_block"><code>generate_navigation_block</code></a>,
which formats the up, previous, and next links, adds a list of subpages if any
exist, and includes a table of contents for the page's sections as extracted by
<a
href="/ref/scripts/build_docs.py:extract_h2_headings"><code>extract_h2_headings</code></a>.
This keeps all of the navigational machinery separate from the actual content,
which [...]
-<p>We also validate that every page in the table of contents exists, and that
there are no unlinked Markdown files in the documentation directory. The <a
href="/ref/scripts/build_docs.py:validate_files"><code>validate_files</code></a>
function performs these checks and fails with a descriptive error if anything
is wrong. This prevents us from accidentally forgetting to add a page to the
table of contents, or from leaving old pages lying around that we meant to
delete.</p>
+<p>The implementation is straightforward. The <a
href="/ref/scripts/docs_build.py:parse_toc"><code>parse_toc</code></a> function
extracts entries from the table of contents section in the index, and <a
href="/ref/scripts/docs_build.py:build_navigation"><code>build_navigation</code></a>
computes the up, previous, and next relationships for each page. The <a
href="/ref/scripts/docs_build.py:update_document"><code>update_document</code></a>
function is then called for each page, which rewri [...]
+<p>The navigation block itself is generated by <a
href="/ref/scripts/docs_build.py:generate_navigation_block"><code>generate_navigation_block</code></a>,
which formats the up, previous, and next links, adds a list of subpages if any
exist, and includes a table of contents for the page's sections as extracted by
<a
href="/ref/scripts/docs_build.py:extract_h2_headings"><code>extract_h2_headings</code></a>.
This keeps all of the navigational machinery separate from the actual content,
which [...]
+<p>We also validate that every page in the table of contents exists, and that
there are no unlinked Markdown files in the documentation directory. The <a
href="/ref/scripts/docs_build.py:validate_files"><code>validate_files</code></a>
function performs these checks and fails with a descriptive error if anything
is wrong. This prevents us from accidentally forgetting to add a page to the
table of contents, or from leaving old pages lying around that we meant to
delete.</p>
diff --git a/atr/docs/code-conventions.md b/atr/docs/code-conventions.md
index 4408c95..2b87c1b 100644
--- a/atr/docs/code-conventions.md
+++ b/atr/docs/code-conventions.md
@@ -311,7 +311,7 @@ a or b and c == d or not e or f
(a or b) and (c == d) or (not e) or f
```
-Because `f` is not a complex expression, it does not get parenthesised. Also
because this rule is about subexpressions only, we do not put parethenses
around the top level.
+Because `f` is not a complex expression, it does not get parenthesised. Also
because this rule is about subexpressions only, we do not put parentheses
around the top level.
```python
# Avoid
diff --git a/atr/docs/overview-of-the-code.html
b/atr/docs/overview-of-the-code.html
index 8823da0..0f8734b 100644
--- a/atr/docs/overview-of-the-code.html
+++ b/atr/docs/overview-of-the-code.html
@@ -39,7 +39,7 @@
<p>The ATR <a href="/ref/atr/worker.py"><code>worker</code></a> module
implements the workers. Each worker process runs in a loop. It claims the
oldest queued task from the database, executes it, records the result, and then
claims the next task atomically using an <code>UPDATE ... WHERE</code>
statement. After a worker has processed a fixed number of tasks, it exits
voluntarily to help to avoid memory leaks. The manager then spawns a fresh
worker to replace it. Task execution happens in [...]
<p>Tasks themselves are defined in the ATR <a
href="/ref/atr/tasks/"><code>tasks</code></a> directory. The <a
href="/ref/atr/tasks/__init__.py"><code>tasks</code></a> module contains
functions for queueing tasks and resolving task types to their handler
functions. Task types include operations such as importing keys, generating
SBOMs, sending messages, and importing files from SVN. The most common category
of task is automated checks on release artifacts. These checks are implemented
in [...]
<h2 id="api">API</h2>
-<p>The ATR API provides programmatic access to most ATR functionality. API
endpoints are defined in <a
href="/ref/atr/api/routes.py"><code>api.routses</code></a>, and their URL paths
are prefixed with <code>/api/</code>. The API uses <a
href="https://www.openapis.org/">OpenAPI</a> for documentation, which is
automatically generated from the endpoint definitions and served at
<code>/api/docs</code>. Users send requests with a <a
href="https://en.wikipedia.org/wiki/JSON_Web_Token">JWT</a> [...]
+<p>The ATR API provides programmatic access to most ATR functionality. API
endpoints are defined in <a
href="/ref/atr/api/__init__.py"><code>api</code></a>, and their URL paths are
prefixed with <code>/api/</code>. The API uses <a
href="https://www.openapis.org/">OpenAPI</a> for documentation, which is
automatically generated from the endpoint definitions and served at
<code>/api/docs</code>. Users send requests with a <a
href="https://en.wikipedia.org/wiki/JSON_Web_Token">JWT</a> create [...]
<p>API request and response models are defined in <a
href="/ref/atr/models/api.py"><code>models.api</code></a> using Pydantic. Each
endpoint has an associated request model that validates incoming data, and a
response model that validates outgoing data. The API returns JSON in all cases,
with appropriate HTTP status codes.</p>
<h2 id="other-important-interfaces">Other important interfaces</h2>
<p>ATR uses ASF OAuth for user login, and then determines what actions each
user can perform based on their committee memberships. The ATR <a
href="/ref/atr/principal.py"><code>principal</code></a> module handles
authorization by checking whether users are members of relevant committees. It
queries and caches LDAP to get committee membership information. The <a
href="/ref/atr/principal.py:Authorisation"><code>Authorisation</code></a> class
provides methods to check whether a user is a me [...]
diff --git a/atr/docs/overview-of-the-code.md b/atr/docs/overview-of-the-code.md
index 935330f..a63da07 100644
--- a/atr/docs/overview-of-the-code.md
+++ b/atr/docs/overview-of-the-code.md
@@ -70,7 +70,7 @@ Tasks themselves are defined in the ATR
[`tasks`](/ref/atr/tasks/) directory. Th
## API
-The ATR API provides programmatic access to most ATR functionality. API
endpoints are defined in [`api.routses`](/ref/atr/api/routes.py), and their URL
paths are prefixed with `/api/`. The API uses
[OpenAPI](https://www.openapis.org/) for documentation, which is automatically
generated from the endpoint definitions and served at `/api/docs`. Users send
requests with a [JWT](https://en.wikipedia.org/wiki/JSON_Web_Token) created
from a [PAT](https://en.wikipedia.org/wiki/Personal_access_to [...]
+The ATR API provides programmatic access to most ATR functionality. API
endpoints are defined in [`api`](/ref/atr/api/__init__.py), and their URL paths
are prefixed with `/api/`. The API uses [OpenAPI](https://www.openapis.org/)
for documentation, which is automatically generated from the endpoint
definitions and served at `/api/docs`. Users send requests with a
[JWT](https://en.wikipedia.org/wiki/JSON_Web_Token) created from a
[PAT](https://en.wikipedia.org/wiki/Personal_access_token). [...]
API request and response models are defined in
[`models.api`](/ref/atr/models/api.py) using Pydantic. Each endpoint has an
associated request model that validates incoming data, and a response model
that validates outgoing data. The API returns JSON in all cases, with
appropriate HTTP status codes.
diff --git a/atr/docs/storage-interface.html b/atr/docs/storage-interface.html
index 54dd0ef..af3c8df 100644
--- a/atr/docs/storage-interface.html
+++ b/atr/docs/storage-interface.html
@@ -26,7 +26,7 @@
<p>The <code>wacp</code> object, short for <code>w</code>rite <code>a</code>s
<code>c</code>ommittee <code>p</code>articipant, provides access to
domain-specific writers: <code>announce</code>, <code>checks</code>,
<code>distributions</code>, <code>keys</code>, <code>policy</code>,
<code>project</code>, <code>release</code>, <code>sbom</code>,
<code>ssh</code>, <code>tokens</code>, and <code>vote</code>.</p>
<p>The write session takes an optional <a
href="/ref/atr/web.py:Committer"><code>Committer</code></a> or ASF UID,
typically <code>session.uid</code> from the logged-in user. If you omit the
UID, the session determines it automatically from the current request context.
The write object checks LDAP memberships and raises <a
href="/ref/atr/storage/__init__.py:AccessError"><code>storage.AccessError</code></a>
if the user is not authorized for the requested permission level.</p>
<p>Because projects belong to committees, we provide <a
href="/ref/atr/storage/__init__.py:as_project_committee_member"><code>write.as_project_committee_member(project_name)</code></a>
and <a
href="/ref/atr/storage/__init__.py:as_project_committee_participant"><code>write.as_project_committee_participant(project_name)</code></a>,
which look up the project's committee and authenticate the user as a member or
participant of that committee. This is convenient when, for example, the URL
prov [...]
-<p>Here is a more complete example from <a
href="/ref/atr/api/routes.py"><code>api/routes.py</code></a> that shows the
classic three step pattern:</p>
+<p>Here is a more complete example from <a
href="/ref/atr/api/__init__.py"><code>api/__init__.py</code></a> that shows the
classic three step pattern:</p>
<pre><code class="language-python">async with storage.write(asf_uid) as write:
# 1. Request permissions
wafc = write.as_foundation_committer()
diff --git a/atr/docs/storage-interface.md b/atr/docs/storage-interface.md
index eff6110..75f86f8 100644
--- a/atr/docs/storage-interface.md
+++ b/atr/docs/storage-interface.md
@@ -43,7 +43,7 @@ The write session takes an optional
[`Committer`](/ref/atr/web.py:Committer) or
Because projects belong to committees, we provide
[`write.as_project_committee_member(project_name)`](/ref/atr/storage/__init__.py:as_project_committee_member)
and
[`write.as_project_committee_participant(project_name)`](/ref/atr/storage/__init__.py:as_project_committee_participant),
which look up the project's committee and authenticate the user as a member or
participant of that committee. This is convenient when, for example, the URL
provides a project name.
-Here is a more complete example from [`api/routes.py`](/ref/atr/api/routes.py)
that shows the classic three step pattern:
+Here is a more complete example from
[`api/__init__.py`](/ref/atr/api/__init__.py) that shows the classic three step
pattern:
```python
async with storage.write(asf_uid) as write:
diff --git a/scripts/docs_check.py b/scripts/docs_check.py
index ed204cb..3c9b906 100755
--- a/scripts/docs_check.py
+++ b/scripts/docs_check.py
@@ -39,6 +39,13 @@ class Heading(NamedTuple):
anchor: str
+class RefLink(NamedTuple):
+ source_file: str
+ line_number: int
+ text: str
+ target: str
+
+
# TODO: Should think more about whether scripts should use the _ convention or
not
# The rationale for using it is that then we can port to non-script code more
easily
# But for scripts *per se*, it does not make sense
@@ -46,6 +53,7 @@ class Heading(NamedTuple):
_LINK_PATTERN: Final = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
_HEADING_PATTERN: Final = re.compile(r"^#+\s+(.+)$")
+_REF_LINK_PATTERN: Final = re.compile(r"\[([^\]]+)\]\(/ref/([^)]+)\)")
def _extract_links(file_path: pathlib.Path) -> list[Link]:
@@ -88,6 +96,24 @@ def _extract_headings(file_path: pathlib.Path) ->
list[Heading]:
return headings
+def _extract_ref_links(file_path: pathlib.Path) -> list[RefLink]:
+ content = file_path.read_text(encoding="utf-8")
+ lines = content.splitlines()
+ links = []
+
+ for line_number, line in enumerate(lines, start=1):
+ for match in _REF_LINK_PATTERN.finditer(line):
+ text = match.group(1)
+ target = match.group(2)
+
+ if ":" in target:
+ target = target.split(":", 1)[0]
+
+ links.append(RefLink(file_path.name, line_number, text, target))
+
+ return links
+
+
def _validate_links(docs_dir: pathlib.Path, all_links: list[Link]) ->
list[str]:
errors = []
existing_files = {f.stem for f in docs_dir.glob("*.md")}
@@ -130,6 +156,17 @@ def _validate_links(docs_dir: pathlib.Path, all_links:
list[Link]) -> list[str]:
return errors
+def _validate_ref_links(project_root: pathlib.Path, all_ref_links:
list[RefLink]) -> list[str]:
+ errors = []
+
+ for link in all_ref_links:
+ file_path = project_root / link.target
+ if not file_path.exists():
+ errors.append(f"{link.source_file}:{link.line_number}: Ref link to
non-existent file '/ref/{link.target}'")
+
+ return errors
+
+
def main() -> None:
docs_dir = pathlib.Path("atr/docs")
@@ -137,12 +174,18 @@ def main() -> None:
print(f"Error: {docs_dir} not found", file=sys.stderr)
sys.exit(1)
+ project_root = docs_dir.parent.parent
+
all_links = []
+ all_ref_links = []
for md_file in docs_dir.glob("*.md"):
links = _extract_links(md_file)
all_links.extend(links)
+ ref_links = _extract_ref_links(md_file)
+ all_ref_links.extend(ref_links)
errors = _validate_links(docs_dir, all_links)
+ errors.extend(_validate_ref_links(project_root, all_ref_links))
if errors:
print("Documentation link validation errors:\n", file=sys.stderr)
@@ -152,6 +195,7 @@ def main() -> None:
sys.exit(1)
print(f"Validated {len(all_links)} links across
{len(list(docs_dir.glob('*.md')))} files")
+ print(f"Validated {len(all_ref_links)} ref links")
print("All links are valid")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]