To my mind, it's not a great idea to clear a resource that you're
dynamically using to determine the contents of a DAG. Is there any way that
you can refactor the table to be immutable? Instead of querying all rows in
the table, you would query records in an "unprocessed" state. Instead of
truncating the table, you would mark everything in the table as
"processed". (Though optional, it would be even better for each row to
store the date it was processed, so that you can re-run this DAG in the
future.)

If storing that much data or refactoring the table isn't possible, could
you run this query once for the day, store the results in S3 (or Redis, or
...), and always fetch those results? That way the DAG always has the "most
recent" view, even if you delete records mid-day.

On Wed, Mar 14, 2018 at 10:20 PM, Aaron Polhamus <aa...@credijusto.com>
wrote:

> Question for the community. Did some hunting around and didn't see any
> compelling answers. SO link:
> https://stackoverflow.com/questions/49290546/how-to-set-
> up-a-dag-when-downstream-task-definitions-depend-on-upstream-outcomes
>
>
> --
>
>
> *Aaron Polhamus*
> *Chief Technology Officer *
>
> Cel (México): +52 (55) 1951-5612
> Cell (USA): +1 (206) 380-3948
> Tel: +52 (55) 1168 9757 - Ext. 181
>
> --
> ***Por favor referirse a nuestra página web
> <https://www.credijusto.com/aviso-de-privacidad/> para más información
> acerca de nuestras políticas de privacidad.*
>
>

Reply via email to