Copilot commented on code in PR #64610:
URL: https://github.com/apache/airflow/pull/64610#discussion_r3030491326
##########
airflow-core/src/airflow/api_fastapi/common/parameters.py:
##########
@@ -267,6 +269,65 @@ def depends_search(
return depends_search
+class _RegexParam(BaseParam[str]):
+ """Filter using database-level regex matching (regexp_match).
+
+ SQLAlchemy's ``regexp_match`` is supported on PostgreSQL (``~`` operator),
+ MySQL/MariaDB (``REGEXP``), and SQLite (via Python ``re.match``).
+ Note: SQLite uses ``re.match`` semantics (anchored at the start of the
+ string), while PostgreSQL uses ``re.search`` (matches anywhere).
Review Comment:
The new regex parameter docs claim SQLite support via Python `re.match`, but
the public OpenAPI spec generated for `/api/v2/assets/events` currently says
the filter is “Not supported on SQLite”. Also, elsewhere in the repo SQLite is
treated as lacking regex support. Please align these statements and ensure the
actual runtime behavior is clearly defined (either document it as unsupported
on SQLite, or explicitly provide/describe the SQLite REGEXP implementation and
its anchored semantics).
```suggestion
SQLAlchemy's ``regexp_match`` is supported on PostgreSQL (``~`` operator)
and MySQL/MariaDB (``REGEXP``). SQLite is not supported.
```
##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/asset_events.py:
##########
@@ -73,6 +74,22 @@ def _get_asset_events_through_sql_clauses(
)
+def _validate_partition_key_pattern(partition_key_pattern: str | None) -> None:
+ """Validate that the partition_key_pattern is a syntactically correct
regex."""
+ if partition_key_pattern is None:
+ return
+ try:
+ re.compile(partition_key_pattern)
+ except re.error as e:
Review Comment:
This pre-validation uses Python `re.compile()`, but the regex is executed by
the database regex engine, which is not fully compatible with Python regex
syntax. That means some DB-valid patterns may be rejected (or DB-invalid
patterns may slip through). Consider handling validation in a DB-aware way, or
catching DB-side regex errors and translating them to a 400 response.
##########
airflow-core/src/airflow/api_fastapi/execution_api/routes/asset_events.py:
##########
@@ -44,7 +45,7 @@ def _get_asset_events_through_sql_clauses(
) -> AssetEventsResponse:
order_by_clause = AssetEvent.timestamp.asc() if ascending else
AssetEvent.timestamp.desc()
asset_events_query =
select(AssetEvent).join(join_clause).where(where_clause).order_by(order_by_clause)
- if limit:
+ if limit is not None:
asset_events_query = asset_events_query.limit(limit)
asset_events = session.scalars(asset_events_query)
Review Comment:
This query joins `AssetEvent.asset`, but `AssetEvent.asset` is configured
`lazy="select"`; accessing `event.asset` inside the response-building loop will
trigger an extra SELECT per event (N+1). Since this endpoint is
performance-sensitive (and now adds regex filtering), consider eager-loading
the relationship (e.g., `contains_eager`/`joinedload`) to avoid per-row DB
roundtrips.
##########
airflow-core/src/airflow/api_fastapi/core_api/openapi/v2-rest-api-generated.yaml:
##########
@@ -340,6 +340,18 @@ paths:
- type: integer
- type: 'null'
title: Source Map Index
+ - name: partition_key_pattern
+ in: query
+ required: false
+ schema:
+ anyOf:
+ - type: string
+ - type: 'null'
+ description: Regular expression pattern for filtering. Uses
database-native
+ regex (PostgreSQL ~ operator, MySQL REGEXP). Not supported on
SQLite.
+ title: Partition Key Pattern
+ description: Regular expression pattern for filtering. Uses
database-native
+ regex (PostgreSQL ~ operator, MySQL REGEXP). Not supported on SQLite.
Review Comment:
The generated OpenAPI description for `partition_key_pattern` says regex
filtering is “Not supported on SQLite”, but the server-side parameter factory /
execution API route descriptions mention SQLite `re.match` semantics. Please
make these consistent so clients know what to expect on SQLite.
```suggestion
regex (PostgreSQL ~ operator, MySQL REGEXP). On SQLite, uses
Python
re.match semantics.
title: Partition Key Pattern
description: Regular expression pattern for filtering. Uses
database-native
regex (PostgreSQL ~ operator, MySQL REGEXP). On SQLite, uses Python
re.match semantics.
```
##########
airflow-core/src/airflow/api_fastapi/common/parameters.py:
##########
@@ -267,6 +269,65 @@ def depends_search(
return depends_search
+class _RegexParam(BaseParam[str]):
+ """Filter using database-level regex matching (regexp_match).
+
+ SQLAlchemy's ``regexp_match`` is supported on PostgreSQL (``~`` operator),
+ MySQL/MariaDB (``REGEXP``), and SQLite (via Python ``re.match``).
+ Note: SQLite uses ``re.match`` semantics (anchored at the start of the
+ string), while PostgreSQL uses ``re.search`` (matches anywhere).
+ """
+
+ def __init__(self, attribute: ColumnElement, skip_none: bool = True) ->
None:
+ super().__init__(skip_none=skip_none)
+ self.attribute: ColumnElement = attribute
+
+ def to_orm(self, select: Select) -> Select:
+ if self.value is None and self.skip_none:
+ return select
+ return select.where(self.attribute.regexp_match(self.value))
+
+ @classmethod
+ def depends(cls, *args: Any, **kwargs: Any) -> Self:
+ raise NotImplementedError("Use regex_param_factory instead, depends is
not implemented.")
+
+
+def _validate_regex_pattern(value: str | None) -> str | None:
+ """Validate that the regex pattern is syntactically correct."""
+ if value is None:
+ return value
+ try:
+ re.compile(value)
+ except re.error as e:
+ raise HTTPException(
+ status_code=status.HTTP_400_BAD_REQUEST,
+ detail=f"Invalid regular expression: {e}",
+ )
+ return value
Review Comment:
Regex validation here uses Python `re.compile()`, but the actual matching is
executed by the database regex engine (PostgreSQL, MySQL/MariaDB, etc.), which
has different syntax/feature support. This can incorrectly reject patterns that
are valid for the DB (or accept patterns that will later fail at query time).
Consider either (a) removing the pre-validation and surfacing DB errors as
400s, or (b) making validation dialect-aware, or at least documenting that
validation is Python-regex based and may differ from the DB.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]