GitHub user Leondon9 created a discussion: Support waiting for asset partition 
readiness from time-scheduled DAGs

I am looking for guidance before attempting an implementation.

In time-scheduled DW/DataLake DAGs, `data_interval_start` and 
`data_interval_end` are often still the primary processing contract. 
Asset-triggered DAGs are useful, but switching a heavy batch consumer to 
`schedule=[Asset(...)]` changes the run semantics because asset-triggered runs 
intentionally do not have a logical date or data interval.

The use case I am trying to model is:

- keep the consumer DAG time-scheduled, for example `@daily`
- keep SQL/business logic based on the consumer's data interval
- wait for an upstream Airflow asset partition, for example `orders / 
dt=2026-05-21`
- avoid coupling the consumer to the producer DAG's exact schedule via 
`ExternalTaskSensor(execution_date_fn=...)`

Today I can solve this with provider-specific sensors, such as S3, Hive, 
BigQuery, or Databricks partition sensors, or with `ExternalTaskSensor`. But I 
could not find an Airflow-native task-level way to wait for an `AssetEvent` / 
asset partition recorded in Airflow metadata.

A concrete shape might look like this:

```python
orders = Asset("warehouse://orders")

wait_orders = AssetPartitionSensor(
    task_id="wait_orders",
    asset=orders,
    partition_key="{{ data_interval_start | ds }}",
    deferrable=True,
)
```

The intended semantics would be:

- the consumer DAG is still created by its time schedule
- the sensor waits for an Airflow asset event matching `asset + partition_key`
- the consumer can continue using its own `data_interval_start` / 
`data_interval_end`
- the producer DAG can be scheduled, manual, backfilled, or otherwise changed, 
as long as it emits the expected asset partition event

This is not intended to replace asset-triggered scheduling, and I am not 
proposing to derive `data_interval` for asset-triggered DAG runs. I am trying 
to clarify whether there is room for a time-scheduled DAG to wait on asset 
partition readiness as a task-level dependency.

This seems related to the problem discussed in #55489, where users need a 
common date/partition concept across scheduled, triggered, and asset-aware 
workflows. The maintainer comments there also point toward asset partitions / 
`dag_run.partition_key` as the more general direction.

Questions:

1. Is this gap already intended to be covered by asset partitions in Airflow 
3.2+?
2. If not, would a task-level asset partition readiness sensor fit Airflow's 
direction?
3. Should such a feature live in the standard provider, core, or only be 
documented as a pattern using provider-specific sensors?
4. What semantics would be acceptable: latest event exists, any event exists, 
source-run-aware matching, or something else?

If this direction makes sense, I would be interested in helping with a small 
scoped follow-up, likely starting with docs or a design issue before any 
implementation PR.


GitHub link: https://github.com/apache/airflow/discussions/67375

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to