It's not just webserver and scheduler that will parse your dag file.
During the execution of a dag run, dag file will be re-parsed at the start
of every task instance.  If you have 1000 tasks running in short period of
time, that's 1000 queries.  It's possible these queries are piling up in a
queue on your database.  Dag read time has to be very fast for this reason.



On Thu, Aug 15, 2019 at 1:45 PM Bacal, Eugene <eba...@paypal.com.invalid>
wrote:

>
> Thank you for your reply, Max
>
> Dynamic DAGs query the database for tables and generates DAGs and tasks
> based on the output.
> For Python does not take much to execute:
>
> Dynamic - 500 tasks:
> time python PPAD_OIS_MASTER_IDI.py
> [2019-08-15 12:57:48,522] {settings.py:174} INFO -
> setting.configure_orm(): Using pool settings. pool_size=30, pool_recycle=300
> real    0m1.830s
> user    0m1.622s
> sys     0m0.188s
>
>
> Static - 100 tasks:
> time python PPAD_OPS_CANARY_CONNECTIONS_TEST_8.py
> [2019-08-15 12:59:24,959] {settings.py:174} INFO -
> setting.configure_orm(): Using pool settings. pool_size=30, pool_recycle=300
> real    0m1.009s
> user    0m0.898s
> sys     0m0.108s
>
>
> We have 44 DAGs with 1003 Dynamic tasks. Parsing in quite time:
> DagBag parsing time: 3.9385959999999995
>
> Parsing in time of execution, when scheduler submits the DAGs:
> DagBag parsing time: 99.820316
>
> Delay between the task run inside a single DAG grow from 30 sec to 10 min,
> then it drops back even thou tasks are runnign.
>
> Eugene
>
>
>
>
>
> ´╗┐On 8/15/19, 11:52 AM, "Maxime Beauchemin" <maximebeauche...@gmail.com>
> wrote:
>
>     What is your dynamic DAG doing? How long does it take to execute it
> just as
>     a python script (`time python mydag.py`)?
>
>     As an Airflow admin, people may want to lower the DAG parsing timeout
>     configuration key to force people to not do crazy thing in DAG module
>     scope. At some point at Airbnb we had someone running a Hive query in
> DAG
>     scope, clearly that needs to be prevented.
>
>     Loading DAGs by calling a database can bring all sorts of surprises
> that
>     can drive everyone crazy. As mentioned in a recent post,
> repo-contained,
>     deterministic "less dynamic" DAGs are great, because they are
>     self-contained and allow you to use source-control properly (revert a
> bad
>     change for instance). That may mean having a process or script that
>     compiles external things that are dynamic into things like yaml files
>     checked into the code repo. Things as simple as parsing duration become
>     more predictable (network latency and database load are not part of
> that
>     equation), but more importantly, all changes become tracked in the code
>     repo.
>
>     yaml parsing in python can be pretty slow too, and there are solutions
> /
>     alternatives there. Hocon is great. Also C-accelerated yaml is
> possible:
>
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F27743711%2Fcan-i-speedup-yaml&amp;data=01%7C01%7Cebacal%40paypal.com%7Cb01b585b5bf348b7ee4808d721b1c363%7Cfb00791460204374977e21bac5f3f4c8%7C1&amp;sdata=n05lhbbyxOVY96UgCkOOg7zRVZD0KD78oD98RotL224%3D&amp;reserved=0
>
>     Max
>
>     On Wed, Aug 14, 2019 at 9:56 PM Bacal, Eugene
> <eba...@paypal.com.invalid>
>     wrote:
>
>     > Hello Airflow team,
>     >
>     > Please advise if you can. In our environment, we have noticed that
> dynamic
>     > tasks place quite of stress on scheduler, webserver and increase
> MySQL DB
>     > connections.
>     > We are run about 1000 Dynamic Tasks every 30 min and parsing time
>     > increases from 5 to 65 sec with Runtime from 2sec to 350+ . This
> happens at
>     > execution time then it drops to normal while still executing tasks.
>     > Webserver hangs for few minutes.
>     >
>     > Airflow 1.10.1.
>     > MySQL DB
>     >
>     > Example:
>     >
>     > Dynamic Tasks:
>     > Number of DAGs: 44
>     > Total task number: 950
>     > DagBag parsing time: 65.879642000000001
>     >
>     > Static Tasks:
>     > Number of DAGs: 73
>     > Total task number: 1351
>     > DagBag parsing time: 1.731088
>     >
>     > Is this something you aware of? Any advises on Dynamic tasks
>     > optimization/best practices?
>     >
>     > Thank you in advance,
>     > Eugene
>     >
>     >
>     >
>
>
>

Reply via email to