bastienmenis commented on PR #57342:
URL: https://github.com/apache/airflow/pull/57342#issuecomment-3453858551

   > Is there a practical use for data interval start/end in your use case that 
you need it for? Other than adding consistentcy?
   > 
   > In my view the parameters are mainly covered for histporical scheduled use 
cases to define the data intervals. Manually triggering a Dag is in almost all 
cases (looking for an exception) out of such interval bounds. I would expect if 
you want to manually run such things that it is a special advanced case _or_ 
the option to make a backfill are the right way to go.
   > 
   > I am asking for a real use case because the trigger form is already 
over-loaded with options (in my view) and I fear that there are more users 
confused by the options than it brings a benefit. I'd accept it if there is a 
real use case driving this addition. There might be.
   
   @jscheffl Hi Jens, thanks for your response. Indeed, I do have a use-case in 
mind!
   
   We have a DAG responsible for doing some data aggregation over some time 
series data. It is scheduled on a custom timetable: It runs once a day, and 
uses a data interval covering the previous 72 hours.
   
   What we would like to do is run a backfill, but a "normal" backfill would be 
wasteful as it would process data for each day three times (since the data 
intervals for consecutive runs overlap). So it would be useful to be able to 
trigger the DAG manually but with a custom data interval, that is different 
than the default one set in the timetable.
   
   Of course we could create a separate DAG just for the purpose of 
backfilling, but that doesn't feel as neat as using the same DAG.
   
   Here is how I understand the use cases for manually triggering DAG runs:
   - If the user want to trigger runs as per the timetable, they use the 
"Backfill" option
   - If the user want to trigger a run independently from the timetable, they 
use the "Single run" option. In that case, since the logic that determines the 
data interval for each run is tied to the timetable, it feels that prompting 
the user for the data interval makes sense. I'd actually suggest that those 
field could sit in the main form, rather than in the "Advanced option" section.
   
   Of course that is my interpretation based on our narrow use-case 😆. We 
haven't been using Airflow for long so I understand that our usage might differ 
from the majority of users 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to