To my mind, it's not a great idea to clear a resource that you're dynamically using to determine the contents of a DAG. Is there any way that you can refactor the table to be immutable? Instead of querying all rows in the table, you would query records in an "unprocessed" state. Instead of truncating the table, you would mark everything in the table as "processed". (Though optional, it would be even better for each row to store the date it was processed, so that you can re-run this DAG in the future.)
If storing that much data or refactoring the table isn't possible, could you run this query once for the day, store the results in S3 (or Redis, or ...), and always fetch those results? That way the DAG always has the "most recent" view, even if you delete records mid-day. On Wed, Mar 14, 2018 at 10:20 PM, Aaron Polhamus <aa...@credijusto.com> wrote: > Question for the community. Did some hunting around and didn't see any > compelling answers. SO link: > https://stackoverflow.com/questions/49290546/how-to-set- > up-a-dag-when-downstream-task-definitions-depend-on-upstream-outcomes > > > -- > > > *Aaron Polhamus* > *Chief Technology Officer * > > Cel (México): +52 (55) 1951-5612 > Cell (USA): +1 (206) 380-3948 > Tel: +52 (55) 1168 9757 - Ext. 181 > > -- > ***Por favor referirse a nuestra página web > <https://www.credijusto.com/aviso-de-privacidad/> para más información > acerca de nuestras políticas de privacidad.* > >