A possible (hacky?) workaround would be: 1. Create a DAG that is scheduled to run every 1 minute with max_active_runs=2 and catchup=False 2. As the first task in the DAG set as the sensor which needs wait on when the DAG in Airflow should be triggered and set task_concurrency to 1 3. As the second task (downstream from the 1st task) have the TriggerDagRunOperator
This should work in the use case that you're not needing the triggered DAG to run more than once a minute. Also you will need to handle with care the sensor logic so the next sensor doesn't trigger the same Trigger DAG. Damian -----Original Message----- From: bharath palaksha <[email protected]> Sent: Friday, February 14, 2020 05:33 To: [email protected] Subject: Re: Trigger based dag run 1. External system which i am thinking of doesn't have the ability to run code 2. If we use first task as sensor, how is the dag run getting created for the first task which is a sensor to run? Running user code on scheduler does seems problematic, can think of some other way for that. I will think about it Thanks, Bharath On Fri, Feb 14, 2020 at 3:57 PM Ash Berlin-Taylor <[email protected]> wrote: > If you have the ability to run code from the external system you might > want to consider using the ("experimental") API to trigger the dag run > from the external system? > > > http://airflow.apache.org/docs/stable/api.html#post--api-experimental- > dags--DAG_ID--dag_runs When using the API doesn't work for you the > common approach I have seen is as you hint at -- having a "trigger" > dag that runs (frequently depending on your needs), checks the > external condition and uses TriggerDagRunOperator. > The other way I have seen this done is to just have the first task of > your dag be a sensor that checks/waits on the external resource. With > the recently added "reschedule" mode of sensors this also doesn't tie > up a worker slot when the sensor isn't running. This is the approach I > have used in the past when processingly weekly datasets that would > appear anywhere in a 72 hour window after the expected delivery time. > > Given these options exist I'm not quite sure I see the need for a new > parameter to the DAG (especially one which runs user code in the > scheduler, that gets quite a strong no from me) Could you perhaps > explain your idea in more detail, specifically how it fits in to your > workflow, and why you don't want to use the two methods I talked about here? > Thanks, > Ash > On Feb 14 2020, at 10:10 am, bharath palaksha <[email protected]> > wrote: > > Hi, > > > > I have been using airflow extensively in my current work at Walmart Labs. > > while working on our requirements, came across a functionality which > > is missing in airflow and if implemented will be very useful. > > Currently, Airflow is a schedule based workflow management, a cron > > expression defines the creation of dag runs. If there is a > > dependency on > a > > different dag - TriggerDagRunOperator helps in creating dag runs. > > Suppose, there is a dependency which is outside of Airflow cluster eg: > > different database, filesystem or an event from an API which is an > upstream > > dependency. There is no way in Airflow to achieve this unless we > schedule a > > DAG for a a very short interval and allow it to poll. > > > > To solve above issue, what if airflow takes 2 different args - > > schedule_interval and trigger_sensor. > > > > - schedule_interval - works the same way as it is already working > > now > > - trigger_sensor - accepts a sensor which returns true when an event > > is sensed and this in turn creates a dag run > > > > If you specify both the argument, schedule_interval takes precedence. > > Scheduler parses all DAGs in a loop for every heartbeat and checks > > for > DAGs > > which has reached scheduled time and creates DAG run, same loop can > > also check for trigger_sensor and if argument is set - check if it > > returns > true > > to create dag run. This might slow down scheduler as it has to > > execute sensors now, we can find some other way to avoid slowness. > > Can we create AIP for this? Any thoughts? > > > > Thanks, > > Bharath > > > > =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================
