This is an automated email from the ASF dual-hosted git repository.

sbp pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tooling-trusted-releases.git


The following commit(s) were added to refs/heads/main by this push:
     new 8b794f4  Add a documentation link checker and fix some documentation 
bugs
8b794f4 is described below

commit 8b794f47754d4cf0678255b5cfac0194466c61a5
Author: Sean B. Palmer <[email protected]>
AuthorDate: Sun Nov 16 15:14:16 2025 +0000

    Add a documentation link checker and fix some documentation bugs
---
 atr/docs/build-processes.html      |  8 +++----
 atr/docs/code-conventions.md       |  2 +-
 atr/docs/overview-of-the-code.html |  2 +-
 atr/docs/overview-of-the-code.md   |  2 +-
 atr/docs/storage-interface.html    |  2 +-
 atr/docs/storage-interface.md      |  2 +-
 scripts/docs_check.py              | 44 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/atr/docs/build-processes.html b/atr/docs/build-processes.html
index 6ef5075..4dec9fc 100644
--- a/atr/docs/build-processes.html
+++ b/atr/docs/build-processes.html
@@ -8,8 +8,8 @@
 </ul>
 <h2 id="documentation-build-script">Documentation build script</h2>
 <p>To <strong>regenerate the documentation</strong>, run <code>make 
docs</code>.</p>
-<p>The ATR documentation that you're reading right now is structured like a 
book, with numbered chapters, sections, and navigation links between pages. We 
could maintain all of this by hand, but that would be tedious and error-prone. 
Instead, we use <a 
href="/ref/scripts/build_docs.py"><code>scripts/build_docs.py</code></a> to 
generate the navigation automatically from a single table of contents.</p>
+<p>The ATR documentation that you're reading right now is structured like a 
book, with numbered chapters, sections, and navigation links between pages. We 
could maintain all of this by hand, but that would be tedious and error-prone. 
Instead, we use <a 
href="/ref/scripts/docs_build.py"><code>scripts/docs_build.py</code></a> to 
generate the navigation automatically from a single table of contents.</p>
 <p>The script reads the table of contents in <a 
href="/ref/atr/docs/index.md"><code>atr/docs/index.md</code></a>, extracts the 
hierarchy of pages, and then updates every referenced page to include the 
correct navigation links, page numbers, and section listings. This means that 
when we want to reorganize the documentation (say, inserting a new chapter or 
moving sections around) we only need to edit the table of contents, run the 
script, and all the navigation is updated automatically.</p>
-<p>The implementation is straightforward. The <a 
href="/ref/scripts/build_docs.py:parse_toc"><code>parse_toc</code></a> function 
extracts entries from the table of contents section in the index, and <a 
href="/ref/scripts/build_docs.py:build_navigation"><code>build_navigation</code></a>
 computes the up, previous, and next relationships for each page. The <a 
href="/ref/scripts/build_docs.py:update_document"><code>update_document</code></a>
 function is then called for each page, which rewri [...]
-<p>The navigation block itself is generated by <a 
href="/ref/scripts/build_docs.py:generate_navigation_block"><code>generate_navigation_block</code></a>,
 which formats the up, previous, and next links, adds a list of subpages if any 
exist, and includes a table of contents for the page's sections as extracted by 
<a 
href="/ref/scripts/build_docs.py:extract_h2_headings"><code>extract_h2_headings</code></a>.
 This keeps all of the navigational machinery separate from the actual content, 
which [...]
-<p>We also validate that every page in the table of contents exists, and that 
there are no unlinked Markdown files in the documentation directory. The <a 
href="/ref/scripts/build_docs.py:validate_files"><code>validate_files</code></a>
 function performs these checks and fails with a descriptive error if anything 
is wrong. This prevents us from accidentally forgetting to add a page to the 
table of contents, or from leaving old pages lying around that we meant to 
delete.</p>
+<p>The implementation is straightforward. The <a 
href="/ref/scripts/docs_build.py:parse_toc"><code>parse_toc</code></a> function 
extracts entries from the table of contents section in the index, and <a 
href="/ref/scripts/docs_build.py:build_navigation"><code>build_navigation</code></a>
 computes the up, previous, and next relationships for each page. The <a 
href="/ref/scripts/docs_build.py:update_document"><code>update_document</code></a>
 function is then called for each page, which rewri [...]
+<p>The navigation block itself is generated by <a 
href="/ref/scripts/docs_build.py:generate_navigation_block"><code>generate_navigation_block</code></a>,
 which formats the up, previous, and next links, adds a list of subpages if any 
exist, and includes a table of contents for the page's sections as extracted by 
<a 
href="/ref/scripts/docs_build.py:extract_h2_headings"><code>extract_h2_headings</code></a>.
 This keeps all of the navigational machinery separate from the actual content, 
which [...]
+<p>We also validate that every page in the table of contents exists, and that 
there are no unlinked Markdown files in the documentation directory. The <a 
href="/ref/scripts/docs_build.py:validate_files"><code>validate_files</code></a>
 function performs these checks and fails with a descriptive error if anything 
is wrong. This prevents us from accidentally forgetting to add a page to the 
table of contents, or from leaving old pages lying around that we meant to 
delete.</p>
diff --git a/atr/docs/code-conventions.md b/atr/docs/code-conventions.md
index 4408c95..2b87c1b 100644
--- a/atr/docs/code-conventions.md
+++ b/atr/docs/code-conventions.md
@@ -311,7 +311,7 @@ a or b and c == d or not e or f
 (a or b) and (c == d) or (not e) or f
 ```
 
-Because `f` is not a complex expression, it does not get parenthesised. Also 
because this rule is about subexpressions only, we do not put parethenses 
around the top level.
+Because `f` is not a complex expression, it does not get parenthesised. Also 
because this rule is about subexpressions only, we do not put parentheses 
around the top level.
 
 ```python
 # Avoid
diff --git a/atr/docs/overview-of-the-code.html 
b/atr/docs/overview-of-the-code.html
index 8823da0..0f8734b 100644
--- a/atr/docs/overview-of-the-code.html
+++ b/atr/docs/overview-of-the-code.html
@@ -39,7 +39,7 @@
 <p>The ATR <a href="/ref/atr/worker.py"><code>worker</code></a> module 
implements the workers. Each worker process runs in a loop. It claims the 
oldest queued task from the database, executes it, records the result, and then 
claims the next task atomically using an <code>UPDATE ... WHERE</code> 
statement. After a worker has processed a fixed number of tasks, it exits 
voluntarily to help to avoid memory leaks. The manager then spawns a fresh 
worker to replace it. Task execution happens in [...]
 <p>Tasks themselves are defined in the ATR <a 
href="/ref/atr/tasks/"><code>tasks</code></a> directory. The <a 
href="/ref/atr/tasks/__init__.py"><code>tasks</code></a> module contains 
functions for queueing tasks and resolving task types to their handler 
functions. Task types include operations such as importing keys, generating 
SBOMs, sending messages, and importing files from SVN. The most common category 
of task is automated checks on release artifacts. These checks are implemented 
in  [...]
 <h2 id="api">API</h2>
-<p>The ATR API provides programmatic access to most ATR functionality. API 
endpoints are defined in <a 
href="/ref/atr/api/routes.py"><code>api.routses</code></a>, and their URL paths 
are prefixed with <code>/api/</code>. The API uses <a 
href="https://www.openapis.org/";>OpenAPI</a> for documentation, which is 
automatically generated from the endpoint definitions and served at 
<code>/api/docs</code>. Users send requests with a <a 
href="https://en.wikipedia.org/wiki/JSON_Web_Token";>JWT</a>  [...]
+<p>The ATR API provides programmatic access to most ATR functionality. API 
endpoints are defined in <a 
href="/ref/atr/api/__init__.py"><code>api</code></a>, and their URL paths are 
prefixed with <code>/api/</code>. The API uses <a 
href="https://www.openapis.org/";>OpenAPI</a> for documentation, which is 
automatically generated from the endpoint definitions and served at 
<code>/api/docs</code>. Users send requests with a <a 
href="https://en.wikipedia.org/wiki/JSON_Web_Token";>JWT</a> create [...]
 <p>API request and response models are defined in <a 
href="/ref/atr/models/api.py"><code>models.api</code></a> using Pydantic. Each 
endpoint has an associated request model that validates incoming data, and a 
response model that validates outgoing data. The API returns JSON in all cases, 
with appropriate HTTP status codes.</p>
 <h2 id="other-important-interfaces">Other important interfaces</h2>
 <p>ATR uses ASF OAuth for user login, and then determines what actions each 
user can perform based on their committee memberships. The ATR <a 
href="/ref/atr/principal.py"><code>principal</code></a> module handles 
authorization by checking whether users are members of relevant committees. It 
queries and caches LDAP to get committee membership information. The <a 
href="/ref/atr/principal.py:Authorisation"><code>Authorisation</code></a> class 
provides methods to check whether a user is a me [...]
diff --git a/atr/docs/overview-of-the-code.md b/atr/docs/overview-of-the-code.md
index 935330f..a63da07 100644
--- a/atr/docs/overview-of-the-code.md
+++ b/atr/docs/overview-of-the-code.md
@@ -70,7 +70,7 @@ Tasks themselves are defined in the ATR 
[`tasks`](/ref/atr/tasks/) directory. Th
 
 ## API
 
-The ATR API provides programmatic access to most ATR functionality. API 
endpoints are defined in [`api.routses`](/ref/atr/api/routes.py), and their URL 
paths are prefixed with `/api/`. The API uses 
[OpenAPI](https://www.openapis.org/) for documentation, which is automatically 
generated from the endpoint definitions and served at `/api/docs`. Users send 
requests with a [JWT](https://en.wikipedia.org/wiki/JSON_Web_Token) created 
from a [PAT](https://en.wikipedia.org/wiki/Personal_access_to [...]
+The ATR API provides programmatic access to most ATR functionality. API 
endpoints are defined in [`api`](/ref/atr/api/__init__.py), and their URL paths 
are prefixed with `/api/`. The API uses [OpenAPI](https://www.openapis.org/) 
for documentation, which is automatically generated from the endpoint 
definitions and served at `/api/docs`. Users send requests with a 
[JWT](https://en.wikipedia.org/wiki/JSON_Web_Token) created from a 
[PAT](https://en.wikipedia.org/wiki/Personal_access_token).  [...]
 
 API request and response models are defined in 
[`models.api`](/ref/atr/models/api.py) using Pydantic. Each endpoint has an 
associated request model that validates incoming data, and a response model 
that validates outgoing data. The API returns JSON in all cases, with 
appropriate HTTP status codes.
 
diff --git a/atr/docs/storage-interface.html b/atr/docs/storage-interface.html
index 54dd0ef..af3c8df 100644
--- a/atr/docs/storage-interface.html
+++ b/atr/docs/storage-interface.html
@@ -26,7 +26,7 @@
 <p>The <code>wacp</code> object, short for <code>w</code>rite <code>a</code>s 
<code>c</code>ommittee <code>p</code>articipant, provides access to 
domain-specific writers: <code>announce</code>, <code>checks</code>, 
<code>distributions</code>, <code>keys</code>, <code>policy</code>, 
<code>project</code>, <code>release</code>, <code>sbom</code>, 
<code>ssh</code>, <code>tokens</code>, and <code>vote</code>.</p>
 <p>The write session takes an optional <a 
href="/ref/atr/web.py:Committer"><code>Committer</code></a> or ASF UID, 
typically <code>session.uid</code> from the logged-in user. If you omit the 
UID, the session determines it automatically from the current request context. 
The write object checks LDAP memberships and raises <a 
href="/ref/atr/storage/__init__.py:AccessError"><code>storage.AccessError</code></a>
 if the user is not authorized for the requested permission level.</p>
 <p>Because projects belong to committees, we provide <a 
href="/ref/atr/storage/__init__.py:as_project_committee_member"><code>write.as_project_committee_member(project_name)</code></a>
 and <a 
href="/ref/atr/storage/__init__.py:as_project_committee_participant"><code>write.as_project_committee_participant(project_name)</code></a>,
 which look up the project's committee and authenticate the user as a member or 
participant of that committee. This is convenient when, for example, the URL 
prov [...]
-<p>Here is a more complete example from <a 
href="/ref/atr/api/routes.py"><code>api/routes.py</code></a> that shows the 
classic three step pattern:</p>
+<p>Here is a more complete example from <a 
href="/ref/atr/api/__init__.py"><code>api/__init__.py</code></a> that shows the 
classic three step pattern:</p>
 <pre><code class="language-python">async with storage.write(asf_uid) as write:
     # 1. Request permissions
     wafc = write.as_foundation_committer()
diff --git a/atr/docs/storage-interface.md b/atr/docs/storage-interface.md
index eff6110..75f86f8 100644
--- a/atr/docs/storage-interface.md
+++ b/atr/docs/storage-interface.md
@@ -43,7 +43,7 @@ The write session takes an optional 
[`Committer`](/ref/atr/web.py:Committer) or
 
 Because projects belong to committees, we provide 
[`write.as_project_committee_member(project_name)`](/ref/atr/storage/__init__.py:as_project_committee_member)
 and 
[`write.as_project_committee_participant(project_name)`](/ref/atr/storage/__init__.py:as_project_committee_participant),
 which look up the project's committee and authenticate the user as a member or 
participant of that committee. This is convenient when, for example, the URL 
provides a project name.
 
-Here is a more complete example from [`api/routes.py`](/ref/atr/api/routes.py) 
that shows the classic three step pattern:
+Here is a more complete example from 
[`api/__init__.py`](/ref/atr/api/__init__.py) that shows the classic three step 
pattern:
 
 ```python
 async with storage.write(asf_uid) as write:
diff --git a/scripts/docs_check.py b/scripts/docs_check.py
index ed204cb..3c9b906 100755
--- a/scripts/docs_check.py
+++ b/scripts/docs_check.py
@@ -39,6 +39,13 @@ class Heading(NamedTuple):
     anchor: str
 
 
+class RefLink(NamedTuple):
+    source_file: str
+    line_number: int
+    text: str
+    target: str
+
+
 # TODO: Should think more about whether scripts should use the _ convention or 
not
 # The rationale for using it is that then we can port to non-script code more 
easily
 # But for scripts *per se*, it does not make sense
@@ -46,6 +53,7 @@ class Heading(NamedTuple):
 
 _LINK_PATTERN: Final = re.compile(r"\[([^\]]+)\]\(([^)]+)\)")
 _HEADING_PATTERN: Final = re.compile(r"^#+\s+(.+)$")
+_REF_LINK_PATTERN: Final = re.compile(r"\[([^\]]+)\]\(/ref/([^)]+)\)")
 
 
 def _extract_links(file_path: pathlib.Path) -> list[Link]:
@@ -88,6 +96,24 @@ def _extract_headings(file_path: pathlib.Path) -> 
list[Heading]:
     return headings
 
 
+def _extract_ref_links(file_path: pathlib.Path) -> list[RefLink]:
+    content = file_path.read_text(encoding="utf-8")
+    lines = content.splitlines()
+    links = []
+
+    for line_number, line in enumerate(lines, start=1):
+        for match in _REF_LINK_PATTERN.finditer(line):
+            text = match.group(1)
+            target = match.group(2)
+
+            if ":" in target:
+                target = target.split(":", 1)[0]
+
+            links.append(RefLink(file_path.name, line_number, text, target))
+
+    return links
+
+
 def _validate_links(docs_dir: pathlib.Path, all_links: list[Link]) -> 
list[str]:
     errors = []
     existing_files = {f.stem for f in docs_dir.glob("*.md")}
@@ -130,6 +156,17 @@ def _validate_links(docs_dir: pathlib.Path, all_links: 
list[Link]) -> list[str]:
     return errors
 
 
+def _validate_ref_links(project_root: pathlib.Path, all_ref_links: 
list[RefLink]) -> list[str]:
+    errors = []
+
+    for link in all_ref_links:
+        file_path = project_root / link.target
+        if not file_path.exists():
+            errors.append(f"{link.source_file}:{link.line_number}: Ref link to 
non-existent file '/ref/{link.target}'")
+
+    return errors
+
+
 def main() -> None:
     docs_dir = pathlib.Path("atr/docs")
 
@@ -137,12 +174,18 @@ def main() -> None:
         print(f"Error: {docs_dir} not found", file=sys.stderr)
         sys.exit(1)
 
+    project_root = docs_dir.parent.parent
+
     all_links = []
+    all_ref_links = []
     for md_file in docs_dir.glob("*.md"):
         links = _extract_links(md_file)
         all_links.extend(links)
+        ref_links = _extract_ref_links(md_file)
+        all_ref_links.extend(ref_links)
 
     errors = _validate_links(docs_dir, all_links)
+    errors.extend(_validate_ref_links(project_root, all_ref_links))
 
     if errors:
         print("Documentation link validation errors:\n", file=sys.stderr)
@@ -152,6 +195,7 @@ def main() -> None:
         sys.exit(1)
 
     print(f"Validated {len(all_links)} links across 
{len(list(docs_dir.glob('*.md')))} files")
+    print(f"Validated {len(all_ref_links)} ref links")
     print("All links are valid")
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to