Re: [PR] Add API Support for Filtering DAGs by Timetable Type [airflow]

via GitHub Tue, 19 Aug 2025 08:13:30 -0700


kaxil commented on code in PR #54341:
URL: https://github.com/apache/airflow/pull/54341#discussion_r2285582855



##########
airflow-core/src/airflow/api_fastapi/common/parameters.py:
##########
@@ -448,6 +449,38 @@ def depends(cls, owners: list[str] = 
Query(default_factory=list)) -> _OwnersFilt
         return cls().set_value(owners)
 
 
+class _TimetableTypesFilter(BaseParam[list[str]]):
+    """Filter on timetable type."""
+
+    def to_orm(self, select: Select) -> Select:
+        if self.skip_none is False:
+            raise ValueError(f"Cannot set 'skip_none' to False on a 
{type(self)}")
+
+        if not self.value:  # No filter provided
+            return select
+
+        select = select.join(SerializedDagModel, SerializedDagModel.dag_id == 
DagModel.dag_id)
+
+        # JSON path for timetable type
+        json_path = "$.dag.timetable.__type"

Review Comment:
   I can think of a few better solutions that don't need this hardcoded 
`json_path` (which is susceptible to change across serialized versions and 
makes the code brittle).
   
   ## Alternative Approaches:
   
   ### 1) Store `timetable_type` in DagModel
   Add a `timetable_type` column to the `dag` table, similar to existing 
`timetable_summary` and `timetable_description`.
   
   **Pros:** Existing pattern, Fast SQL-level filtering, efficient pagination, 
scales well  
   **Cons:** Requires database migration, data duplication
   
   ### 2) Use **`LazyDeserializedDAG`**
   ```py
   def get_timetable_type(dag_model: DagModel) -> str | None:
       if not dag_model.serialized_dag:
           return None
       
       try:
           lazy_dag = LazyDeserializedDAG(data=dag_model.serialized_dag.data)
           timetable = lazy_dag.timetable
           return 
f"{timetable.__class__.__module__}.{timetable.__class__.__name__}"
       except Exception:
           return None
   
   def filter_dags_by_timetable_type(dag_models: list[DagModel], 
timetable_types: list[str]) -> list[DagModel]:
       if not timetable_types:
           return dag_models
       
       filtered_dags = []
       for dag_model in dag_models:
           dag_timetable_type = get_timetable_type(dag_model)
           if dag_timetable_type in timetable_types:
               filtered_dags.append(dag_model)
       
       return filtered_dags
   ```
   
   **Pros:** No schema changes, more robust than JSON paths, uses existing 
serialized data  
   **Cons:** Post-query filtering only, deserialization overhead, pagination 
complexity
   
   ### 3) Use **`DagBagDep`** - Use existing DagBag dependency injection
   Leverage the existing `DagBagDep` already used in individual DAG endpoints.
   
   **Pros:** Uses deserialized dags, future-proof, follows existing patterns  
   **Cons:** Potential N+1 query issues since dags are lazyily loaded, 
post-query filtering only, DagBag loading overhead
   
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add API Support for Filtering DAGs by Timetable Type [airflow]

Reply via email to