potiuk commented on a change in pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#discussion_r499630961



##########
File path: airflow/models/dag.py
##########
@@ -1824,10 +1960,34 @@ class DagModel(Base):
     # Tags for view filter
     tags = relationship('DagTag', cascade='all,delete-orphan', 
backref=backref('dag'))
 
+    concurrency = Column(Integer, nullable=False)
+
+    has_task_concurrency_limits = Column(Boolean, nullable=False)
+
+    # The execution_date of the next dag run
+    next_dagrun = Column(UtcDateTime)
+    # Earliest time at which this ``next_dagrun`` can be created
+    next_dagrun_create_after = Column(UtcDateTime)
+
     __table_args__ = (
         Index('idx_root_dag_id', root_dag_id, unique=False),
+        Index('idx_next_dagrun_create_after', next_dagrun_create_after, 
unique=False),
     )
 
+    NUM_DAGS_PER_DAGRUN_QUERY = conf.getint(
+        'scheduler',
+        'num_dags_needing_dagrun_per_scheduler_loop',
+        fallback=10
+    )

Review comment:
       Yeah. I think we should keep it documented. I think also some 
consequences - what we "expect" to happen if we decrease/increase it.  This all 
can be put in the docs similarly to those comments we have for the other 
parameter added.
   
   As I see it, changing this number has this behavior (I hope I am 
interpreting it correctly):
   
   If we decrease it, the scheduling might take longer but have smaller random 
latency for some Dag Runs. if we have more dags to process in one transaction, 
it means that there might be a delay in scheduling those dags which are at teh 
beginning of the batch. However this can decrease the overall "capacity" of the 
scheduler as batch processing of DagRuns simply uses less resources than 
processing them one-by-one. There is also less contention possible (do we have 
other queries/processes that compete for those locks)?  If so then there is a 
higher chance of those contention locks  happening and then the overall 
capacity of the system can be impacted by that. Another impact here is that 
when we have few big dags (in multi-scheduler scenario) that one scheduler will 
be doing most of the job.
   
   
   Is that correct description? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to