blag commented on code in PR #29441:
URL: https://github.com/apache/airflow/pull/29441#discussion_r1102128972
##########
airflow/www/views.py:
##########
@@ -3715,7 +3715,6 @@ def next_run_datasets(self, dag_id):
DatasetEvent,
and_(
DatasetEvent.dataset_id == DatasetModel.id,
- DatasetEvent.timestamp > DatasetDagRunQueue.created_at,
Review Comment:
😬 It's been a long minute since I wrote this, but...
I believe when I wrote this the intent with the `lastUpdate` field was to
only show last updates _since the last time the dag was queued/run_. But yeah,
the `lastUpdate` label isn't descriptive enough for that.
Option 1: Personally, I would consider changing the name of what the
`lastUpdate` field is rendered to, something like "Last update since last run"
or something more wordy.
Option 2: But if you don't want to do that, and you want to display the last
update for every dataset regardless of whether has already been "consumed" by a
DagRun (eg: in either the DatasetDagRunQueue or actually scheduled into a
DagRun), then yeah it makes sense to remove this filter. However, I would also
remove the `and_` around it since then there would only be one filter condition
in that join:
```python
.join(
DatasetEvent,
DatasetEvent.dataset_id == DatasetModel.id,
isouter=True,
)
```
If you go for option 2, I think you should be able to compare the existence
and creation time of the DDRQ with the DatasetEvent timestamp to figure out
whether or not the last update time has already triggered a DagRun or if it has
partially satisfied the conditions of a future DagRun.
Hope this makes sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]