My $.02 - posted to SO as well. I fought this use case for a long time. In short, a dag that’s built based on the state of a changing resource, especially a db table, doesn’t fly so well in airflow.
My solution was to write a small custom operator that’s a subclass if truggerdagoperator, it does the query and then triggers dagruns for each of the subprocess. It makes the process “join” downstream more interesting, but in my use case I was able to work around it with another dag that polls and short circuits if all the sub processes for a given day have completed. In other cases partition sensors can do the trick. I have several use cases like this (iterative dag trigger based on a dynamic source), and after a lot of fighting with making dynamic Subdags work (a lot), I switched to this “trigger subprocess” strategy and have been doing well since. Note - this may make a large number of dagruns for one targ (the target). This makes the UI challenging in some places, but it’s workable (and I’ve started querying the db directly because I’m not ready to write a plugin that does UI stuffs) Sent from a device with less than stellar autocorrect > On Mar 15, 2018, at 7:22 AM, James Meickle <[email protected]> wrote: > > To my mind, it's not a great idea to clear a resource that you're > dynamically using to determine the contents of a DAG. Is there any way that > you can refactor the table to be immutable? Instead of querying all rows in > the table, you would query records in an "unprocessed" state. Instead of > truncating the table, you would mark everything in the table as > "processed". (Though optional, it would be even better for each row to > store the date it was processed, so that you can re-run this DAG in the > future.) > > If storing that much data or refactoring the table isn't possible, could > you run this query once for the day, store the results in S3 (or Redis, or > ...), and always fetch those results? That way the DAG always has the "most > recent" view, even if you delete records mid-day. > > On Wed, Mar 14, 2018 at 10:20 PM, Aaron Polhamus <[email protected]> > wrote: > >> Question for the community. Did some hunting around and didn't see any >> compelling answers. SO link: >> https://stackoverflow.com/questions/49290546/how-to-set- >> up-a-dag-when-downstream-task-definitions-depend-on-upstream-outcomes >> >> >> -- >> >> >> *Aaron Polhamus* >> *Chief Technology Officer * >> >> Cel (México): +52 (55) 1951-5612 >> Cell (USA): +1 (206) 380-3948 >> Tel: +52 (55) 1168 9757 - Ext. 181 >> >> -- >> ***Por favor referirse a nuestra página web >> <https://www.credijusto.com/aviso-de-privacidad/> para más información >> acerca de nuestras políticas de privacidad.* >> >>
