Sweet work Kamil and others! I'll try to go through them today!

Cheers, Fokko

Op ma 24 feb. 2020 om 22:37 schreef Tao Feng <fengta...@gmail.com>:

> Great work Kamil! Let us know once it is landed in one of the future
> releases. Would love to try it out :)
>
> Best,
> -Tao
>
> On Mon, Feb 24, 2020 at 12:54 PM Qingping Hou <q...@scribd.com> wrote:
>
> > Awesome work Kamil! Great to see us embracing query batching in the
> > code base. I can't wait to deploy those optimizations into our
> > production environment.
> >
> > Thanks,
> > QP Hou
> >
> > On Mon, Feb 24, 2020 at 8:35 AM Kamil Breguła <kamil.breg...@polidea.com
> >
> > wrote:
> > >
> > > Hello,
> > >
> > > Polidea [1]  together with Databand [2] has taken steps to optimize
> > > scheduler performance.
> > > I made many changes last weekend:
> > > 1. [AIRFLOW-6856] Bulk fetch paused_dag_ids
> > > https://github.com/apache/airflow/pull/7476
> > > 2. [AIRFLOW-6857] Bulk sync DAGs
> > > https://github.com/apache/airflow/pull/7477
> > > 3. [AIRFLOW-6862] Do not check the freshness of fresh DAG
> > > https://github.com/apache/airflow/pull/7481
> > > 4. [AIRFLOW-6869] Bulk fetch DAGRuns for _process_task_instances
> > > https://github.com/apache/airflow/pull/7489
> > > 5. [AIRFLOW-6881] Bulk fetch DAGRun for create_dag_run
> > > https://github.com/apache/airflow/pull/7502
> > > 6. [AIRFLOW-6887] Do not check the state of fresh DAGRun
> > > https://github.com/apache/airflow/pull/7510
> > > These changes have not yet been merged to allow review by wider
> > > audiences. Any feedback is very helpful. The result of the performance
> > > benchmark is available in the description of each change.
> > >
> > > When it comes to the overall changes, It looks as follows.
> > >
> > > Before:
> > > Average time: 8080.246 ms
> > > Queries count: 2692
> > > After:
> > > Average time: 628.801 ms
> > > Queries count:  5
> > > Diff:
> > > Average time: -7452 ms (-92%)
> > > Queries count: 2687 (-99%)
> > >
> > > My changes focused only on DagFileProcessor, but this generates the
> > > most database queries and takes a significant amount of scheduler's
> > > time.
> > >
> > > Tomek Urbaszek's change has also been merged in the past to improve
> > performance.
> > > 7. [AIRFLOW-6590] Use batch db operations in jobs
> > > https://github.com/apache/airflow/pull/7370
> > >
> > > This is not the last improvement of performance. We still keep working
> > > and other changes will appear in the future.
> > >
> > > Many thanks to friends from Databand [https://databand.ai/] for
> support.
> > >
> > > Best regards,
> > > Kamil Breguła
> > >
> > > [1] https://www.polidea.com/services/
> > > [2] https://databand.ai/about/
> >
>

Reply via email to