kaxil commented on code in PR #54341:
URL: https://github.com/apache/airflow/pull/54341#discussion_r2285582855
##########
airflow-core/src/airflow/api_fastapi/common/parameters.py:
##########
@@ -448,6 +449,38 @@ def depends(cls, owners: list[str] =
Query(default_factory=list)) -> _OwnersFilt
return cls().set_value(owners)
+class _TimetableTypesFilter(BaseParam[list[str]]):
+ """Filter on timetable type."""
+
+ def to_orm(self, select: Select) -> Select:
+ if self.skip_none is False:
+ raise ValueError(f"Cannot set 'skip_none' to False on a
{type(self)}")
+
+ if not self.value: # No filter provided
+ return select
+
+ select = select.join(SerializedDagModel, SerializedDagModel.dag_id ==
DagModel.dag_id)
+
+ # JSON path for timetable type
+ json_path = "$.dag.timetable.__type"
Review Comment:
I can think of a few better solutions that don't need this hardcoded
`json_path` (which is susceptible to change across serialized versions and
makes the code brittle).
## Alternative Approaches:
### 1) Store `timetable_type` in DagModel
Add a `timetable_type` column to the `dag` table, similar to existing
`timetable_summary` and `timetable_description`.
**Pros:** Existing pattern, Fast SQL-level filtering, efficient pagination,
scales well
**Cons:** Requires database migration, data duplication
### 2) Use **`LazyDeserializedDAG`**
```py
def get_timetable_type(dag_model: DagModel) -> str | None:
if not dag_model.serialized_dag:
return None
try:
lazy_dag = LazyDeserializedDAG(data=dag_model.serialized_dag.data)
timetable = lazy_dag.timetable
return
f"{timetable.__class__.__module__}.{timetable.__class__.__name__}"
except Exception:
return None
def filter_dags_by_timetable_type(dag_models: list[DagModel],
timetable_types: list[str]) -> list[DagModel]:
if not timetable_types:
return dag_models
filtered_dags = []
for dag_model in dag_models:
dag_timetable_type = get_timetable_type(dag_model)
if dag_timetable_type in timetable_types:
filtered_dags.append(dag_model)
return filtered_dags
```
**Pros:** No schema changes, more robust than JSON paths, uses existing
serialized data
**Cons:** Post-query filtering only, deserialization overhead, pagination
complexity
### 3) Use **`DagBagDep`** - Use existing DagBag dependency injection
Leverage the existing `DagBagDep` already used in individual DAG endpoints.
**Pros:** Uses deserialized dags, future-proof, follows existing patterns
**Cons:** Potential N+1 query issues since dags are lazyily loaded,
post-query filtering only, DagBag loading overhead
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]