This is an automated email from the ASF dual-hosted git repository.

sbp pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tooling-trusted-releases.git


The following commit(s) were added to refs/heads/main by this push:
     new d846152  Add database documentation about core models and other 
features
d846152 is described below

commit d846152e30b4499819857bd288891b3e5b76782a
Author: Sean B. Palmer <[email protected]>
AuthorDate: Fri Oct 10 17:10:31 2025 +0100

    Add database documentation about core models and other features
---
 atr/docs/database.html | 25 +++++++++++++++++++++++++
 atr/docs/database.md   | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 atr/models/sql.py      |  1 +
 3 files changed, 74 insertions(+)

diff --git a/atr/docs/database.html b/atr/docs/database.html
index e274d6b..656fa73 100644
--- a/atr/docs/database.html
+++ b/atr/docs/database.html
@@ -5,6 +5,31 @@
 <p><strong>Sections</strong>:</p>
 <ul>
 <li><a href="#introduction">Introduction</a></li>
+<li><a href="#core-models">Core models</a></li>
+<li><a href="#other-features">Other features</a></li>
 </ul>
 <h2 id="introduction">Introduction</h2>
 <p>ATR stores all of its data in a SQLite database. The database schema is 
defined in <a href="/ref/atr/models/sql.py"><code>models.sql</code></a> using 
<a href="https://sqlmodel.tiangolo.com/";>SQLModel</a>, which uses <a 
href="https://docs.pydantic.dev/latest/";>Pydantic</a> for data validation and 
<a href="https://www.sqlalchemy.org/";>SQLAlchemy</a> for database operations. 
This page explains the main features of the database schema to help you 
understand how data is structured in ATR.</p>
+<h2 id="core-models">Core models</h2>
+<p>The three most important models in ATR are <a 
href="/ref/atr/models/sql.py:Committee"><code>Committee</code></a>, <a 
href="/ref/atr/models/sql.py:Project"><code>Project</code></a>, and <a 
href="/ref/atr/models/sql.py:Release"><code>Release</code></a>.</p>
+<p>A <a href="/ref/atr/models/sql.py:Committee"><code>Committee</code></a> 
represents a PMC or PPMC at the ASF. Each committee has a name, which is the 
primary key, and a full name for display purposes. Committees can have child 
committees, which is used for the relationship between the Incubator PMC and 
individual podling PPMCs. Committees also have lists of committee members and 
committers stored as JSON arrays.</p>
+<p>A <a href="/ref/atr/models/sql.py:Project"><code>Project</code></a> belongs 
to a committee and can have multiple releases. Projects have a name as the 
primary key, along with metadata such as a description and category and 
programming language tags. Each project can optionally have a <a 
href="/ref/atr/models/sql.py:ReleasePolicy"><code>ReleasePolicy</code></a> that 
defines how releases should be handled, including e.g. vote templates and 
GitHub workflow configuration.</p>
+<p>A <a href="/ref/atr/models/sql.py:Release"><code>Release</code></a> belongs 
to a project and represents a specific version of software which is voted on by 
a committee. The primary key is a name derived from the project name and 
version. Releases have a phase that indicates their current state in the 
release process, from draft composition to final publication. Each release can 
have multiple <a 
href="/ref/atr/models/sql.py:Revision"><code>Revision</code></a> instances 
before final pub [...]
+<h2 id="other-features">Other features</h2>
+<p>The models themselves are the most important components, but to support 
those models we need other components such as enumerations, column types, 
automatically populated fields, computed properties, and constraints.</p>
+<h3 id="enumerations">Enumerations</h3>
+<p>ATR uses Python enumerations to ensure that certain fields only contain 
valid values. The most important enumeration is <a 
href="/ref/atr/models/sql.py:ReleasePhase"><code>ReleasePhase</code></a>, which 
defines the four phases of a release: <code>RELEASE_CANDIDATE_DRAFT</code> for 
composing, <code>RELEASE_CANDIDATE</code> for voting, 
<code>RELEASE_PREVIEW</code> for finishing, and <code>RELEASE</code> for 
completed releases.</p>
+<p>The <a href="/ref/atr/models/sql.py:TaskStatus"><code>TaskStatus</code></a> 
enumeration defines the states a task can be in: <code>QUEUED</code>, 
<code>ACTIVE</code>, <code>COMPLETED</code>, or <code>FAILED</code>. The <a 
href="/ref/atr/models/sql.py:TaskType"><code>TaskType</code></a> enumeration 
lists all the different types of background tasks that ATR can execute, from 
signature checks to SBOM generation.</p>
+<p>The <a 
href="/ref/atr/models/sql.py:DistributionPlatform"><code>DistributionPlatform</code></a>
 enumeration is more complex, as each value contains not just a name but a <a 
href="/ref/atr/models/sql.py:DistributionPlatformValue"><code>DistributionPlatformValue</code></a>
 with template URLs and configuration for different package distribution 
platforms like PyPI, npm, and Maven Central.</p>
+<h3 id="special-column-types">Special column types</h3>
+<p>SQLite does not support all the data types we need, so we use SQLAlchemy 
type decorators to handle conversions. The <a 
href="/ref/atr/models/sql.py:UTCDateTime"><code>UTCDateTime</code></a> type 
ensures that all datetime values are stored in UTC and returned as 
timezone-aware datetime objects. When Python code provides a datetime with 
timezone information, the type decorator converts it to UTC before storing. 
When reading from the database, it adds back the UTC timezone information.</p>
+<p>The <a 
href="/ref/atr/models/sql.py:ResultsJSON"><code>ResultsJSON</code></a> type 
handles storing task results. It automatically serializes Pydantic models to 
JSON when writing to the database, and deserializes them back to the 
appropriate result model when reading.</p>
+<h3 id="automatic-field-population">Automatic field population</h3>
+<p>Some fields are populated automatically using SQLAlchemy event listeners. 
When a new <a href="/ref/atr/models/sql.py:Revision"><code>Revision</code></a> 
is created, the <a 
href="/ref/atr/models/sql.py:populate_revision_sequence_and_name"><code>populate_revision_sequence_and_name</code></a>
 function runs before the database insert. This function queries for the 
highest existing sequence number for the release, increments it, and sets both 
the <code>seq</code> and <code>number</code> fi [...]
+<p>The <a 
href="/ref/atr/models/sql.py:check_release_name"><code>check_release_name</code></a>
 function runs before inserting a release. If the release name is empty, it 
automatically generates it from the project name and version using the <a 
href="/ref/atr/models/sql.py:release_name"><code>release_name</code></a> helper 
function.</p>
+<h3 id="computed-properties">Computed properties</h3>
+<p>Some properties are computed dynamically rather than stored in the 
database. The <code>Release.latest_revision_number</code> property is 
implemented as a SQLAlchemy column property using a correlated subquery. This 
means that when you access <code>release.latest_revision_number</code>, 
SQLAlchemy automatically executes a query to find the highest revision number 
for that release. The query is defined once in <a 
href="/ref/atr/models/sql.py:RELEASE_LATEST_REVISION_NUMBER"><code>RELEASE [...]
+<p>Projects have many computed properties that provide access to release 
policy settings with appropriate defaults. For example, 
<code>Project.policy_start_vote_template</code> returns the custom vote 
template if one is configured, or falls back to 
<code>Project.policy_start_vote_default</code> if not. This pattern allows 
projects to customize their release process while providing sensible 
defaults.</p>
+<h3 id="constraints-and-validation">Constraints and validation</h3>
+<p>Database constraints ensure data integrity. The <a 
href="/ref/atr/models/sql.py:Task"><code>Task</code></a> model includes a check 
constraint that validates the status transitions. A task must start in 
<code>QUEUED</code> state, can only transition to <code>ACTIVE</code> when 
<code>started</code> and <code>pid</code> are set, and can only reach 
<code>COMPLETED</code> or <code>FAILED</code> when the <code>completed</code> 
timestamp is set. These constraints prevent invalid state transi [...]
+<p>Unique constraints ensure that certain combinations of fields are unique. 
The <code>Release</code> model has a unique constraint on <code>(project_name, 
version)</code> to prevent creating duplicate releases for the same project 
version. The <code>Revision</code> model has two unique constraints: one on 
<code>(release_name, seq)</code> and another on <code>(release_name, 
number)</code>, ensuring that revision numbers are unique within a release.</p>
diff --git a/atr/docs/database.md b/atr/docs/database.md
index 4e49028..cd5db57 100644
--- a/atr/docs/database.md
+++ b/atr/docs/database.md
@@ -9,7 +9,55 @@
 **Sections**:
 
 * [Introduction](#introduction)
+* [Core models](#core-models)
+* [Other features](#other-features)
 
 ## Introduction
 
 ATR stores all of its data in a SQLite database. The database schema is 
defined in [`models.sql`](/ref/atr/models/sql.py) using 
[SQLModel](https://sqlmodel.tiangolo.com/), which uses 
[Pydantic](https://docs.pydantic.dev/latest/) for data validation and 
[SQLAlchemy](https://www.sqlalchemy.org/) for database operations. This page 
explains the main features of the database schema to help you understand how 
data is structured in ATR.
+
+## Core models
+
+The three most important models in ATR are 
[`Committee`](/ref/atr/models/sql.py:Committee), 
[`Project`](/ref/atr/models/sql.py:Project), and 
[`Release`](/ref/atr/models/sql.py:Release).
+
+A [`Committee`](/ref/atr/models/sql.py:Committee) represents a PMC or PPMC at 
the ASF. Each committee has a name, which is the primary key, and a full name 
for display purposes. Committees can have child committees, which is used for 
the relationship between the Incubator PMC and individual podling PPMCs. 
Committees also have lists of committee members and committers stored as JSON 
arrays.
+
+A [`Project`](/ref/atr/models/sql.py:Project) belongs to a committee and can 
have multiple releases. Projects have a name as the primary key, along with 
metadata such as a description and category and programming language tags. Each 
project can optionally have a 
[`ReleasePolicy`](/ref/atr/models/sql.py:ReleasePolicy) that defines how 
releases should be handled, including e.g. vote templates and GitHub workflow 
configuration.
+
+A [`Release`](/ref/atr/models/sql.py:Release) belongs to a project and 
represents a specific version of software which is voted on by a committee. The 
primary key is a name derived from the project name and version. Releases have 
a phase that indicates their current state in the release process, from draft 
composition to final publication. Each release can have multiple 
[`Revision`](/ref/atr/models/sql.py:Revision) instances before final 
publication, representing iterations of the underl [...]
+
+## Other features
+
+The models themselves are the most important components, but to support those 
models we need other components such as enumerations, column types, 
automatically populated fields, computed properties, and constraints.
+
+### Enumerations
+
+ATR uses Python enumerations to ensure that certain fields only contain valid 
values. The most important enumeration is 
[`ReleasePhase`](/ref/atr/models/sql.py:ReleasePhase), which defines the four 
phases of a release: `RELEASE_CANDIDATE_DRAFT` for composing, 
`RELEASE_CANDIDATE` for voting, `RELEASE_PREVIEW` for finishing, and `RELEASE` 
for completed releases.
+
+The [`TaskStatus`](/ref/atr/models/sql.py:TaskStatus) enumeration defines the 
states a task can be in: `QUEUED`, `ACTIVE`, `COMPLETED`, or `FAILED`. The 
[`TaskType`](/ref/atr/models/sql.py:TaskType) enumeration lists all the 
different types of background tasks that ATR can execute, from signature checks 
to SBOM generation.
+
+The [`DistributionPlatform`](/ref/atr/models/sql.py:DistributionPlatform) 
enumeration is more complex, as each value contains not just a name but a 
[`DistributionPlatformValue`](/ref/atr/models/sql.py:DistributionPlatformValue) 
with template URLs and configuration for different package distribution 
platforms like PyPI, npm, and Maven Central.
+
+### Special column types
+
+SQLite does not support all the data types we need, so we use SQLAlchemy type 
decorators to handle conversions. The 
[`UTCDateTime`](/ref/atr/models/sql.py:UTCDateTime) type ensures that all 
datetime values are stored in UTC and returned as timezone-aware datetime 
objects. When Python code provides a datetime with timezone information, the 
type decorator converts it to UTC before storing. When reading from the 
database, it adds back the UTC timezone information.
+
+The [`ResultsJSON`](/ref/atr/models/sql.py:ResultsJSON) type handles storing 
task results. It automatically serializes Pydantic models to JSON when writing 
to the database, and deserializes them back to the appropriate result model 
when reading.
+
+### Automatic field population
+
+Some fields are populated automatically using SQLAlchemy event listeners. When 
a new [`Revision`](/ref/atr/models/sql.py:Revision) is created, the 
[`populate_revision_sequence_and_name`](/ref/atr/models/sql.py:populate_revision_sequence_and_name)
 function runs before the database insert. This function queries for the 
highest existing sequence number for the release, increments it, and sets both 
the `seq` and `number` fields. It also constructs the revision name by 
combining the release n [...]
+
+The [`check_release_name`](/ref/atr/models/sql.py:check_release_name) function 
runs before inserting a release. If the release name is empty, it automatically 
generates it from the project name and version using the 
[`release_name`](/ref/atr/models/sql.py:release_name) helper function.
+
+### Computed properties
+
+Some properties are computed dynamically rather than stored in the database. 
The `Release.latest_revision_number` property is implemented as a SQLAlchemy 
column property using a correlated subquery. This means that when you access 
`release.latest_revision_number`, SQLAlchemy automatically executes a query to 
find the highest revision number for that release. The query is defined once in 
[`RELEASE_LATEST_REVISION_NUMBER`](/ref/atr/models/sql.py:RELEASE_LATEST_REVISION_NUMBER)
 and attached [...]
+
+Projects have many computed properties that provide access to release policy 
settings with appropriate defaults. For example, 
`Project.policy_start_vote_template` returns the custom vote template if one is 
configured, or falls back to `Project.policy_start_vote_default` if not. This 
pattern allows projects to customize their release process while providing 
sensible defaults.
+
+### Constraints and validation
+
+Database constraints ensure data integrity. The 
[`Task`](/ref/atr/models/sql.py:Task) model includes a check constraint that 
validates the status transitions. A task must start in `QUEUED` state, can only 
transition to `ACTIVE` when `started` and `pid` are set, and can only reach 
`COMPLETED` or `FAILED` when the `completed` timestamp is set. These 
constraints prevent invalid state transitions at the database level.
+
+Unique constraints ensure that certain combinations of fields are unique. The 
`Release` model has a unique constraint on `(project_name, version)` to prevent 
creating duplicate releases for the same project version. The `Revision` model 
has two unique constraints: one on `(release_name, seq)` and another on 
`(release_name, number)`, ensuring that revision numbers are unique within a 
release.
diff --git a/atr/models/sql.py b/atr/models/sql.py
index 01e7581..d3f25c3 100644
--- a/atr/models/sql.py
+++ b/atr/models/sql.py
@@ -280,6 +280,7 @@ class ResultsJSON(sqlalchemy.types.TypeDecorator):
         try:
             return results.ResultsAdapter.validate_python(value)
         except pydantic.ValidationError:
+            # TODO: Should we make this more strict?
             return None
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to