dstandish commented on PR #50984:
URL: https://github.com/apache/airflow/pull/50984#issuecomment-2914266370

   @pierrejeambrun re
   
   > We shouldn't have to do this because it can yield wrong results. We are 
capable of emitting a query that will select 1 Run per dag_id, the one that has 
the max start_date and in case of multiple rows for the same dag_id and same 
max start_date will then choose the single row with the max_dag_run_id as a 
second criteria. (maybe with a window function over the partition of dagrun 
with the latest_start date or two nested subqueries)
   
   Yes, it is _possible_ to write such a query, but it would be expensive.  
This is a simplification that would generally be true.  It would always show 
the latest created run.  But if you cleared an old dag run, still the latest 
created dag run. 
   
   So yeah we could do the more complicated query, but to me, it doesn't really 
seem worth it.  What do you think.  @jedcunningham ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to