[
https://issues.apache.org/jira/browse/AIRFLOW-6912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Lam Phuc updated AIRFLOW-6912:
-------------------------------------
Description:
*Current working Airflow version:* 1.10.7
*Environment:* Kubernetes 1.13, Helm chart 5.2.4
*Airflow version that breaks:* 1.10.9
*Description:*
* We have a list of _ssh_operators_ tasks in a dag that need to be executed in
parallel (as shown in the screenshot) and everything is working fine at Airflow
version _1.10.7_
* We tried to update Airflow to version _1.10.9_ and the tasks break in random
orders and number. (as shown in the screenshot)
* Here are some of the error that we collected:
** (psycopg2.OperationalError) FATAL: remaining connection slots are reserved
for non-replication superuser connections
**
Executor reports task instance <TaskInstance: <dag_name>.<task_name>
2020-02-20 12:20:00+00:00 [queued]> finished (failed) although the task says
its queued. Was the task killed externally?
** <TaskInstance: <dag_name>.<task_name> 2020-02-20 08:20:00+00:00 [running]>
detected as zombie
** (psycopg2.OperationalError) FATAL: remaining connection slots are reserved
for non-replication superuser connections
*Actions taken:*
* **We suspected that there were not enough database connections so we
increased the _AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE_ value from 5 to 50 but the
problem still persists.
* We reverted the version back to _1.10.7_ and everything works as per normal.
was:
*Current working Airflow version:* 1.10.7
*Environment:* Kubernetes 1.13, Helm chart 5.2.4
*Airflow version that breaks:* 1.10.9
*Description:*
* **We have a list of _ssh_operators_ tasks in a dag that need to be executed
in parallel (as shown in the screenshot) and everything is working fine at
Airflow version _1.10.7_
* We tried to update Airflow to version _1.10.9_ and the tasks break in random
orders and number. (as shown in the screenshot)
* Here are some of the error that we collected:
**
(psycopg2.OperationalError) FATAL: remaining connection slots are reserved for
non-replication superuser connections
**
Executor reports task instance <TaskInstance: <dag_name>.<task_name> 2020-02-20
12:20:00+00:00 [queued]> finished (failed) although the task says its queued.
Was the task killed externally?
**
<TaskInstance: <dag_name>.<task_name> 2020-02-20 08:20:00+00:00 [running]>
detected as zombie
**
(psycopg2.OperationalError) FATAL: remaining connection slots are reserved for
non-replication superuser connections
*Actions taken:*
* **We suspected that there were not enough database connections so we
increased the _AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE_ value from 5 to 50 but the
problem still persists.
* We reverted the version back to _1.10.7_ and everything works as per normal.
> Airflow unable to run concurrent ssh tasks (up to 27) in a single dag
> ---------------------------------------------------------------------
>
> Key: AIRFLOW-6912
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6912
> Project: Apache Airflow
> Issue Type: Bug
> Components: DAG, database, executors
> Affects Versions: 1.10.9
> Environment: Kubernetes 1.13
> Reporter: Nguyen Lam Phuc
> Priority: Blocker
> Fix For: 1.10.7
>
> Attachments: version_1_10_7-working.png, version_1_10_9_break.png
>
>
> *Current working Airflow version:* 1.10.7
> *Environment:* Kubernetes 1.13, Helm chart 5.2.4
> *Airflow version that breaks:* 1.10.9
> *Description:*
> * We have a list of _ssh_operators_ tasks in a dag that need to be executed
> in parallel (as shown in the screenshot) and everything is working fine at
> Airflow version _1.10.7_
> * We tried to update Airflow to version _1.10.9_ and the tasks break in
> random orders and number. (as shown in the screenshot)
> * Here are some of the error that we collected:
> ** (psycopg2.OperationalError) FATAL: remaining connection slots are
> reserved for non-replication superuser connections
> **
> Executor reports task instance <TaskInstance: <dag_name>.<task_name>
> 2020-02-20 12:20:00+00:00 [queued]> finished (failed) although the task says
> its queued. Was the task killed externally?
> ** <TaskInstance: <dag_name>.<task_name> 2020-02-20 08:20:00+00:00
> [running]> detected as zombie
> ** (psycopg2.OperationalError) FATAL: remaining connection slots are
> reserved for non-replication superuser connections
> *Actions taken:*
> * **We suspected that there were not enough database connections so we
> increased the _AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE_ value from 5 to 50 but
> the problem still persists.
> * We reverted the version back to _1.10.7_ and everything works as per
> normal.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)