Mzhussiyev commented on issue #52711:
URL: https://github.com/apache/airflow/issues/52711#issuecomment-3051290446

   Hey @karenbraganz, thanks for detailed answer.
   
   We've analyzed the situation for a couple of days. This happened several 
times again and ...
   Indeed, it's seems like something with DAG deactivation from serialized_dag 
table.
   From the scheduler container logs we spot that behavior and it happens very 
often (probably during heavy DAG runs) and for a lot of DAGs since our Airflow 
upgrade...
   
   <img width="1507" height="618" alt="Image" 
src="https://github.com/user-attachments/assets/70029a3f-8cd0-433f-a632-6567924bfa3e";
 />
   
   Moreover, it's become difficult to open and see DAG tasks in UI, even after 
DAG run has finished. 
   It's kinda stuck trying to load/serialize the DAG for a long time.
   
   <img width="865" height="837" alt="Image" 
src="https://github.com/user-attachments/assets/722a811f-fb40-4c4d-8b9b-e066be72d80f";
 />
   
   I recon we have to play with scheduler/core params to stabilize the 
situation, increase DAG parse time etc. 
   Would you recommend us proper parameters value!?
   
   Our current server configs in AWS:
   
   ec2 instance type: m6a.xlarge
   db instance type: db.t4g.small
   
   RAM consumes only up to 50% of it's full capacity at peak times
   CPU consumption is stable all the time, avg is 20% (though max almost 
reaches 100%)
   
   <img width="2960" height="937" alt="Image" 
src="https://github.com/user-attachments/assets/d64a2d57-85a5-42ae-9937-064d314c98ec";
 />
   
   Current Airflow params:
          - 2 pools with 16 slots each.
   
          # Apache Airflow `сore` Configurations Section
         AIRFLOW__CORE__EXECUTOR      = "LocalExecutor"
         AIRFLOW__CORE__LOAD_EXAMPLES = "False"
         # It cannot exceed pool slot size, which is 32 (sum of 2 pools with 16 
each)
         AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG = "14"
         # parallelism is sum of [open, queued, running] task states
         AIRFLOW__CORE__PARALLELISM = "64"
         # This flag sets the minimum interval (in seconds) after which the 
serialized DAGs in the DB should be updated.
         # This helps in reducing database write rate
         AIRFLOW__CORE__MIN_SERIALIZED_DAG_UPDATE_INTERVAL = "180"
         # This option controls how often the Serialized DAG will be re-fetched 
from the DB when it is already loaded in the DagBag in the Webserver.
         # Setting this higher will reduce load on the DB, but at the expense 
of displaying a possibly stale cached version of the DAG
         AIRFLOW__CORE__MIN_SERIALIZED_DAG_FETCH_INTERVAL = "120"
         # How long before timing out a python file import
         AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT = "60"
         # How long before timing out a DagFileProcessor, which processes a dag 
file
         AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT = "120"
         AIRFLOW__CORE__ENABLE_XCOM_PICKLING       = "False"
   
         AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_RECYCLE = "300"
   
         # Apache Airflow `scheduler` Configurations Section
         AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC = "8"
         # The frequency that each DAG file is parsed, in seconds. Updates to 
DAGs are reflected after this interval
         AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL = "180"
         # Any new python dag files that you put in dags folder, it will take 
this much time to be processed by airflow and show up in UI.
         AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL = "180"
         AIRFLOW__SCHEDULER__PRINT_STATS_INTERVAL  = "300"
         # Should the scheduler issue SELECT ... FOR UPDATE in relevant queries.
         # If this is set to False then you should not run more than a single 
scheduler at once.
         AIRFLOW__SCHEDULER__USE_ROW_LEVEL_LOCKING = "False"
   
         # The maximum number of DAGs to create DAG runs for per scheduler loop.
         # Decrease the value to free resources for scheduling tasks. The 
default value is 10.
         AIRFLOW__SCHEDULER__MAX_DAGRUNS_TO_CREATE_PER_LOOP = "2"
   
   And there is another question:
   We have 2 DAGs (D1 and D2). 
   D1 scheduled at 00.00 and uses pool1, avg DAG run duration is 3h.
   D2 scheduled at 1.30 and uses pool2
   BUT, D2 actually queued and starts only once D1 DAG finishes, about 3.00 
even they are use separate pools.
   How does it happen so? And how to make D2 start at scheduled time 1.30?
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to