Lee-W opened a new issue, #67368:
URL: https://github.com/apache/airflow/issues/67368
### Body
## Context
In #66782 we wire up server-side consumption of task-emitted partition keys.
The resulting `AssetEvent.partition_key` depends on whether the task touched
`outlet_events[asset]`:
| Task does... | `AssetEvent.partition_key` |
|---|---|
| nothing | inherits `DagRun.partition_key` |
| sets `extra` only, no `add_partitions()` | `None` |
| `add_partitions("us", "eu")` | `"us"`, `"eu"` |
Rows 1 and 2 look the same from the Dag author's side ("I didn't say
anything about partitions") but produce different events. Before we lock this
in for the 3.2 release, we should pin down what each field means.
## Proposed framing
- **`DagRun.partition_key` = provenance.** The partition this run was
triggered on. Not a default for events the run produces.
- **`AssetEvent.partition_key` = routing pointer.** The key downstream
partition mappers consume to decide which downstream DagRun to queue.
## What that implies
- Silent task → event inherits `DagRun.partition_key`. "I didn't say
anything" is Identity passthrough of the run's own routing key.
- Row 2 is a bug under this framing — the producer was on a partition slice
but left no routing pointer, so downstream partitioned consumers can never
trigger. Normalise to row 1.
- `add_partitions("us", "eu")` is explicit fan-out — one event per routing
pointer.
- `outlet_events[alias].add_partitions(...)` → no-op + warning. Aliases
expand at queue time and have no routing of their own.
- Manual `partition_key` on a non-partitioned Dag → reject.
- Runtime (`PartitionAtRuntime`) silent task → event is `None`. No
provenance to inherit.
- Runtime back-fill of `DagRun.partition_key` from emitted keys is wrong
under provenance framing — the run wasn't triggered on those keys, it
discovered them. Drop it.
- n-1: downstream `DagRun.partition_key` (provenance, target space) and
consumed `AssetEvent.partition_key` (routing, source space) are different by
design. The mapping stays in `PartitionedAssetKeyLog`.
- 1-n: same routing pointer, different target per downstream via each
downstream's mapper. Upstream read source off the event, downstream read target
off the triggered DagRun.
## Today vs. proposal
| Rule | Today | Change if accepted |
|---|---|---|
| Silent task inherits run pk | `_register` fallback in `taskinstance.py`
already does this | no change |
| Row 2 normalised | per-emission `partition_key=None` is preserved, event
ends up `None` | in `_register`, when
`payload.partition_key is None`, substitute `dag_run_partition_key` |
| `add_partitions(...)` fan-out | works | no change |
| Alias `add_partitions` | silently dropped in `_serialize_outlet_events`;
only the manager comment hints at it | warn (and
ignore) in `OutletEventAccessor.add_partitions` when key is
`AssetAliasUniqueKey` |
| Manual pk on non-partitioned Dag | accepted, stored, ignored | reject at
REST/CLI trigger layer based on `timetable.partitioned`
/ `partitioned_at_runtime` |
| Runtime silent task → `None` | depends on whether back-fill has fired
first this turn | falls out for free once back-fill is
dropped |
| Runtime `DagRun.partition_key` back-fill | `taskinstance.py:1519-1525`
back-fills when exactly 1 distinct emitted pk | drop the
block |
| n-1 / 1-n key spaces | implemented via `PartitionedAssetKeyLog` | doc-only
— make the source/target distinction explicit in
`assets.rst` |
All of this is unreleased (3.2.0 ships AIP-76), so changes can land in
place — no compat shim needed.
Related: #59295, #59050, #67239, #66782.
### Committer
- [x] I acknowledge that I am a maintainer/committer of the Apache Airflow
project.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]