[jira] [Updated] (AIRFLOW-6912) Airflow unable to run concurrent ssh tasks (up to 27) in a single dag

Nguyen Lam Phuc (Jira) Mon, 24 Feb 2020 23:13:19 -0800


     [ 
https://issues.apache.org/jira/browse/AIRFLOW-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nguyen Lam Phuc updated AIRFLOW-6912:
-------------------------------------
    Description: 
*Current working Airflow version:* 1.10.7

*Environment:* Kubernetes 1.13, Helm chart 5.2.4

*Airflow version that breaks:* 1.10.9

*Description:*
 * We have a list of _ssh_operators_ tasks in a dag that need to be executed in 
parallel (as shown in the screenshot) and everything is working fine at Airflow 
version _1.10.7_
 * We tried to update Airflow to version _1.10.9_ and the tasks break in random 
orders and number. (as shown in the screenshot)
 * Here are some of the error that we collected:
 ** (psycopg2.OperationalError) FATAL:  remaining connection slots are reserved 
for non-replication superuser connections

 ** 
 Executor reports task instance <TaskInstance: <dag_name>.<task_name> 
2020-02-20 12:20:00+00:00 [queued]> finished (failed) although the task says 
its queued. Was the task killed externally?
 ** <TaskInstance: <dag_name>.<task_name> 2020-02-20 08:20:00+00:00 [running]> 
detected as zombie

 ** (psycopg2.OperationalError) FATAL: remaining connection slots are reserved 
for non-replication superuser connections

*Actions taken:*
 * **We suspected that there were not enough database connections so we 
increased the _AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE_ value from 5 to 50 but the 
problem still persists.
 * We reverted the version back to _1.10.7_ and everything works as per normal.

  was:
*Current working Airflow version:* 1.10.7

*Environment:* Kubernetes 1.13, Helm chart 5.2.4

*Airflow version that breaks:* 1.10.9

*Description:*
 * **We have a list of _ssh_operators_ tasks in a dag that need to be executed 
in parallel (as shown in the screenshot) and everything is working fine at 
Airflow version _1.10.7_
 * We tried to update Airflow to version _1.10.9_ and the tasks break in random 
orders and number. (as shown in the screenshot)
 * Here are some of the error that we collected:
 ** 
(psycopg2.OperationalError) FATAL:  remaining connection slots are reserved for 
non-replication superuser connections
 ** 
Executor reports task instance <TaskInstance: <dag_name>.<task_name> 2020-02-20 
12:20:00+00:00 [queued]> finished (failed) although the task says its queued. 
Was the task killed externally?
 ** 
<TaskInstance: <dag_name>.<task_name> 2020-02-20 08:20:00+00:00 [running]> 
detected as zombie
 ** 
(psycopg2.OperationalError) FATAL:  remaining connection slots are reserved for 
non-replication superuser connections

*Actions taken:*
 * **We suspected that there were not enough database connections so we 
increased the _AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE_ value from 5 to 50 but the 
problem still persists.
 * We reverted the version back to _1.10.7_ and everything works as per normal.


> Airflow unable to run concurrent ssh tasks (up to 27) in a single dag
> ---------------------------------------------------------------------
>
>                 Key: AIRFLOW-6912
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6912
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, database, executors
>    Affects Versions: 1.10.9
>         Environment: Kubernetes 1.13
>            Reporter: Nguyen Lam Phuc
>            Priority: Blocker
>             Fix For: 1.10.7
>
>         Attachments: version_1_10_7-working.png, version_1_10_9_break.png
>
>
> *Current working Airflow version:* 1.10.7
> *Environment:* Kubernetes 1.13, Helm chart 5.2.4
> *Airflow version that breaks:* 1.10.9
> *Description:*
>  * We have a list of _ssh_operators_ tasks in a dag that need to be executed 
> in parallel (as shown in the screenshot) and everything is working fine at 
> Airflow version _1.10.7_
>  * We tried to update Airflow to version _1.10.9_ and the tasks break in 
> random orders and number. (as shown in the screenshot)
>  * Here are some of the error that we collected:
>  ** (psycopg2.OperationalError) FATAL:  remaining connection slots are 
> reserved for non-replication superuser connections
>  ** 
>  Executor reports task instance <TaskInstance: <dag_name>.<task_name> 
> 2020-02-20 12:20:00+00:00 [queued]> finished (failed) although the task says 
> its queued. Was the task killed externally?
>  ** <TaskInstance: <dag_name>.<task_name> 2020-02-20 08:20:00+00:00 
> [running]> detected as zombie
>  ** (psycopg2.OperationalError) FATAL: remaining connection slots are 
> reserved for non-replication superuser connections
> *Actions taken:*
>  * **We suspected that there were not enough database connections so we 
> increased the _AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE_ value from 5 to 50 but 
> the problem still persists.
>  * We reverted the version back to _1.10.7_ and everything works as per 
> normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (AIRFLOW-6912) Airflow unable to run concurrent ssh tasks (up to 27) in a single dag

Reply via email to