[ 
https://issues.apache.org/jira/browse/AIRFLOW-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318964#comment-16318964
 ] 

Josh Bacon edited comment on AIRFLOW-57 at 1/9/18 7:03 PM:
-----------------------------------------------------------

FWIW and FYI, regarding +DAG_CONCURRENCY+ and the statement_ "per_worker would 
suggest that it's a global setting for workers, but I think you can have 
workers with different values set for this"_. My tests using celery workers 
have shown that individual workers +*cannot*+ override/change this concurrency 
"per-worker", the scheduler's configuration is always applied instead. In my 
tests AIRLFOW__CELERY__CELERYD_CONCURRENCY was increased on my workers to run 
more concurrent tasks, but no tasks were sent from the scheduler to fill this 
concurrency (which was limited by a lower AIRFLOW__CORE__DAG_CONCURRENCY).


was (Author: jbacon):
FWIW and FYI, regarding +AIRFLOW__CORE__DAG_CONCURRENCY+ and the statement_ 
"per_worker would suggest that it's a global setting for workers, but I think 
you can have workers with different values set for this"_. My tests using 
celery workers have shown that individual workers +*cannot*+ override/change 
this concurrency "per-worker", the scheduler's configuration is always applied 
instead. In my tests AIRLFOW__CELERY__CELERYD_CONCURRENCY was increased on my 
workers to run more concurrent tasks, but no tasks were sent from the scheduler 
to fill this concurrency (which was limited by a lower 
AIRFLOW__CORE__DAG_CONCURRENCY).

> Rename concurrency configuration variables to be more clear
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-57
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-57
>             Project: Apache Airflow
>          Issue Type: Improvement
>    Affects Versions: Airflow 1.7.0
>            Reporter: Bence Nagy
>            Priority: Minor
>              Labels: easyfix, newbie
>
> Currently the following config variables exists for controlling parallel 
> execution limits:
> {code}
> # The amount of parallelism as a setting to the executor. This defines
> # the max number of task instances that should run simultaneously
> # on this airflow installation
> parallelism = 32
> # The number of task instances allowed to run concurrently by the scheduler
> dag_concurrency = 16
> # When not using pools, tasks are run in the "default pool",
> # whose size is guided by this config element
> non_pooled_task_slot_count = 128
> # The maximum number of active DAG runs per DAG
> max_active_runs_per_dag = 16
> {code}
> Let's go through these one by one:
> {{parallelism}}: not a very descriptive name, considering that all the above 
> settings are for parallelism. The description says it sets the maximum task 
> instances for the airflow installation, which is a bit ambiguous — if I have 
> two hosts running airflow workers, I'd have airflow installed on two hosts, 
> so that should be two installations, but based on context 'per installation' 
> here means 'per Airflow state database'. I'd name this {{max_active_tasks}}.
> {{dag_concurrency}}: Despite the name based on the comment this is actually 
> the task concurrency, and it's per worker. I'd name this 
> {{max_active_tasks_for_worker}} ({{per_worker}} would suggest that it's a 
> global setting for workers, but I think you can have workers with different 
> values set for this).
> {{non_pooled_task_slot_count}}: This is a weird one. I'm going to pass on 
> suggesting a name for it because I just can't think of any reason this config 
> variable should exist. We already have a global task instance limit, and we 
> have pools to limit access to certain resources — in what case would someone 
> want to limit access to everything other than certain resources? So, yeah, 
> skipping this one. In case this was needed only due to how pools are 
> implemented, I'd suggest setting the limit to {{sys.maxsize}} and just 
> removing the config variable.
> {{max_active_runs_per_dag}}: This one's kinda alright, but since it seems to 
> be just a default value for the matching {{DAG}} kwarg, it might be nice to 
> reflect that in the name, something like {{default_max_active_runs_for_dags}}
> So let's move on to the {{DAG}} kwargs:
> {{concurrency}}: Again, having a general name like this, coupled with the 
> fact that concurrency is used for something different elsewhere makes this 
> pretty confusing. I'd call this {{max_active_tasks}}.
> {{max_active_runs}}: This one sounds alright to me.
> So. If people agree that this is something that should be fixed, I think it'd 
> be nice to get this in the 1.7.1 release, especially considering that it 
> should be really easy to make the change backwards compatible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to