Do you mean something like this:

file = download_from_s3()

dag = DAG(...)

with file.lines():

   MyTask(dag=dag, ...)

If so then the answer will be 100s to 1000s of times per day - each time  the 
scheduler parses the DAG looking for tasks _All_ code at the top level will be 
run. So if you unconditionally download the file it will do it A LOT.

-ash

> On 7 Jun 2019, at 18:45, Satya Tumati <satya.tum...@rubrik.com> wrote:
> 
> Hi,
> 
> Our team uses Airflow extensively and we stumbled on an issue that might
> need some help understanding the interplay between scheduler and worker.
> 
> Let's say I wrote a test_dag.py in the dags sub dir and it simply downloads
> a file from S3 which contains a list of strings. Now, I create a linear DAG
> where each task is simply printing a string. These tasks are in the order
> they appear in the file.
> 
> How may times does the file exactly be downloaded for a dag run?
> 
> 
> Thanks,
> Satya

Reply via email to