[
https://issues.apache.org/jira/browse/AIRFLOW-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318964#comment-16318964
]
Josh Bacon commented on AIRFLOW-57:
-----------------------------------
FWIW and FYI, regarding +AIRFLOW__CORE__DAG_CONCURRENCY+ and the statement_
"per_worker would suggest that it's a global setting for workers, but I think
you can have workers with different values set for this"_. My tests using
celery workers have shown that individual workers +*cannot*+ override/change
this concurrency "per-worker", the scheduler's configuration is always applied
instead. In my tests AIRLFOW__CELERY__CELERYD_CONCURRENCY was increased on my
workers to run more concurrent tasks, but no tasks were sent from the scheduler
to fill this concurrency (which was limited by a lower
AIRFLOW__CORE__DAG_CONCURRENCY).
> Rename concurrency configuration variables to be more clear
> -----------------------------------------------------------
>
> Key: AIRFLOW-57
> URL: https://issues.apache.org/jira/browse/AIRFLOW-57
> Project: Apache Airflow
> Issue Type: Improvement
> Affects Versions: Airflow 1.7.0
> Reporter: Bence Nagy
> Priority: Minor
> Labels: easyfix, newbie
>
> Currently the following config variables exists for controlling parallel
> execution limits:
> {code}
> # The amount of parallelism as a setting to the executor. This defines
> # the max number of task instances that should run simultaneously
> # on this airflow installation
> parallelism = 32
> # The number of task instances allowed to run concurrently by the scheduler
> dag_concurrency = 16
> # When not using pools, tasks are run in the "default pool",
> # whose size is guided by this config element
> non_pooled_task_slot_count = 128
> # The maximum number of active DAG runs per DAG
> max_active_runs_per_dag = 16
> {code}
> Let's go through these one by one:
> {{parallelism}}: not a very descriptive name, considering that all the above
> settings are for parallelism. The description says it sets the maximum task
> instances for the airflow installation, which is a bit ambiguous — if I have
> two hosts running airflow workers, I'd have airflow installed on two hosts,
> so that should be two installations, but based on context 'per installation'
> here means 'per Airflow state database'. I'd name this {{max_active_tasks}}.
> {{dag_concurrency}}: Despite the name based on the comment this is actually
> the task concurrency, and it's per worker. I'd name this
> {{max_active_tasks_for_worker}} ({{per_worker}} would suggest that it's a
> global setting for workers, but I think you can have workers with different
> values set for this).
> {{non_pooled_task_slot_count}}: This is a weird one. I'm going to pass on
> suggesting a name for it because I just can't think of any reason this config
> variable should exist. We already have a global task instance limit, and we
> have pools to limit access to certain resources — in what case would someone
> want to limit access to everything other than certain resources? So, yeah,
> skipping this one. In case this was needed only due to how pools are
> implemented, I'd suggest setting the limit to {{sys.maxsize}} and just
> removing the config variable.
> {{max_active_runs_per_dag}}: This one's kinda alright, but since it seems to
> be just a default value for the matching {{DAG}} kwarg, it might be nice to
> reflect that in the name, something like {{default_max_active_runs_for_dags}}
> So let's move on to the {{DAG}} kwargs:
> {{concurrency}}: Again, having a general name like this, coupled with the
> fact that concurrency is used for something different elsewhere makes this
> pretty confusing. I'd call this {{max_active_tasks}}.
> {{max_active_runs}}: This one sounds alright to me.
> So. If people agree that this is something that should be fixed, I think it'd
> be nice to get this in the 1.7.1 release, especially considering that it
> should be really easy to make the change backwards compatible.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)