lyso commented on PR #29602:
URL: https://github.com/apache/airflow/pull/29602#issuecomment-1454694919

   > 
   
   
   
   > Thank for your contributions.
   > 
   > Could you describe a bit more of configurations of Airflow, your DB 
backend, how far it away out of Airflow services (latency), and your workload. 
I asked it because for me it is looks like by this PR we might replace [one 
magic 
number](https://github.com/apache/airflow/pull/3324#discussion_r187540176) by 
another.
   > 
   > I've also found that `max_tis_per_query` use in BackfillJob:
   > 
   > 
https://github.com/apache/airflow/blob/aa4858d85a00759a4a84bdd5fb3fe6cec196831e/airflow/jobs/backfill_job.py#L957-L959
   > 
   > Do you know how it affects this part of code?
   
   I use MWAA medium class, so DB is RDS posgreSQL, 2xSchedulers run on Fargate 
Container with 2vCPU and 4GB RAM. Executor is CeleryExecutor via SQS.
   
   At worst case, there are 200-300 tasks scheduled at the beginning of hours. 
Because of the 512 ``max_tis_per_query``, all of these tasks into single batch 
of the query when a scheduler enqueues tasks. It takes about 1-2 minutes to 
enqueue these tasks and send out to Celery broker. Based on my observation, the 
bottleneck is not the DB query, but is the in-memory process of 
[``_critical_section_enqueue_task_instances``](https://github.com/apache/airflow/blob/c53a3e153f0ab5ca933a94adc01dc6314ea8d4d1/airflow/jobs/scheduler_job.py#L972)
 and ``executor.heartbeat``. The scheduler need 1-2 minutes to change TI state 
from ``SCHEDULED`` to ``QUEUED`` and then send out to broker.
   
   During the 1-2 minutes period, the scheduler cannot make heartbeat and could 
be detected as inactive, then possibly causing unexpected behaviors like 
orphaned task adoption and task external kills. 
   
   Furthermore, in case of 2 or more schedulers (HA), setting 
``max_tis_per_query`` to 512 can lead to one scheduler overloaded, but other 
schedulers have no tasks in their Executor (if ``core.parallelism`` is also 
very big).  
   
   When tuning the parameter, I lowered the ``max_tis_per_query`` to 50, then 
the entire cluster become much healthier. Schedulers can quickly put a batch of 
tasks to Celery broker. Scheduler Heartbeat rate is improved. And, more 
importantly, load are shared between 2 schedulers.
   
   When submitting the value in this PR, I put the magic number as 16 as 
default for ``max_tis_per_query``. I did not choose the value I use in my 
cluster, because I believe this value should be adjusted based on the hardware 
configuration. 16 is a value given with respect to ``core.parallelism`` 
(default 32). When setting ``core.parallelism=32``, we assume the hardware of 
scheduler and database is not very powerful. Optimally ``max_tis_per_query`` 
should be a divisor of ``core.parallelism``. 8 would be too small. So I chose 
16.
   
   #### About ``max_tis_per_query`` in Backfill job
   
   It is used in reset orphaned task method. The ``helpers.reduce_in_chunks`` 
will split the entire list into smaller chunks (of size ``max_tis_per_query``, 
and query DB per chunk. 
   
   I believe 16 is also a setting here. Since the  ``core.parallelism``'s 
default is 32, we assume this is optimal for the assumed hardware. In this 
case, having the query batch size as 512 would be too big for the assumed 
hardware.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to