bastienmenis commented on PR #57342: URL: https://github.com/apache/airflow/pull/57342#issuecomment-3453858551
> Is there a practical use for data interval start/end in your use case that you need it for? Other than adding consistentcy? > > In my view the parameters are mainly covered for histporical scheduled use cases to define the data intervals. Manually triggering a Dag is in almost all cases (looking for an exception) out of such interval bounds. I would expect if you want to manually run such things that it is a special advanced case _or_ the option to make a backfill are the right way to go. > > I am asking for a real use case because the trigger form is already over-loaded with options (in my view) and I fear that there are more users confused by the options than it brings a benefit. I'd accept it if there is a real use case driving this addition. There might be. @jscheffl Hi Jens, thanks for your response. Indeed, I do have a use-case in mind! We have a DAG responsible for doing some data aggregation over some time series data. It is scheduled on a custom timetable: It runs once a day, and uses a data interval covering the previous 72 hours. What we would like to do is run a backfill, but a "normal" backfill would be wasteful as it would process data for each day three times (since the data intervals for consecutive runs overlap). So it would be useful to be able to trigger the DAG manually but with a custom data interval, that is different than the default one set in the timetable. Of course we could create a separate DAG just for the purpose of backfilling, but that doesn't feel as neat as using the same DAG. Here is how I understand the use cases for manually triggering DAG runs: - If the user want to trigger runs as per the timetable, they use the "Backfill" option - If the user want to trigger a run independently from the timetable, they use the "Single run" option. In that case, since the logic that determines the data interval for each run is tied to the timetable, it feels that prompting the user for the data interval makes sense. I'd actually suggest that those field could sit in the main form, rather than in the "Advanced option" section. Of course that is my interpretation based on our narrow use-case 😆. We haven't been using Airflow for long so I understand that our usage might differ from the majority of users -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
