guan404ming commented on PR #61398:
URL: https://github.com/apache/airflow/pull/61398#issuecomment-3878554617
> Can you explain the motivation behind creating a full separate API? And
not improve the existing 'get_dag_runs' ? Same in the UI for having separate
tables and tabs etc.... Are they so different? Sorry I'm missing some context
and probably asking dumb questions :p
Great question, not dumb at all! Here's my reason for why they're separate
`DagRun` and `AssetPartitionDagRun` live in different tables and serve
different purposes. Based on my understanding, `DagRun` is an actual execution
record (~25 columns, full lifecycle tracking). `AssetPartitionDagRun` is kind
of a lightweight pre-execution record (6 columns) that accumulates asset events
for a partition until conditions are met to create a real `DagRun`.
The `partitioned` endpoint needs joins across multiple tables plus
subqueries to compute asset progress (total_received/total_required), while the
`DagRun` endpoint is a straightforward single-table query. The response shapes
are also different: one shows execution details (state, dates, duration, conf),
the other shows asset-accumulation progress (partition_key, which assets have
fired, how many are still needed). Additionally, `get_dag_runs` is a public API
while `get_partitioned_dag_runs` is UI-only, so I think mixing them would
pollute the stable public surface with internal UI-specific fields.
Please feel free to let me know if I have any misunderstanding or there is
any question, thanks!
PS: here is my
[note](https://www.notion.so/Guan-Ming-s-Partition-Note-303ea814fad680de8542c2dec599fb2b?source=copy_link)
for `partition` related implementation, hope it would be helpful for you to
understand the context more!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]