[
https://issues.apache.org/jira/browse/AIRFLOW-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sajid Sajid reassigned AIRFLOW-5214:
------------------------------------
Assignee: Sajid Sajid
> Airflow leaves too many TIME_WAIT TCP connections
> -------------------------------------------------
>
> Key: AIRFLOW-5214
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5214
> Project: Apache Airflow
> Issue Type: Bug
> Components: DagRun, database
> Affects Versions: 1.10.2, 1.10.4
> Environment: CentOS 7, Airflow 1.10.4, Maria DB
> Reporter: Oliver Ricken
> Assignee: Sajid Sajid
> Priority: Critical
>
> Dear experts,
> in Airflow version 1.10.2 as well as 1.10.4, we experience a severe problem
> with the limitation of the number of concurrent tasks.
> We observe that for more than 8 tasks being started and executed in parallel,
> that the majority of those tasks fails with the error "Can't connect to MySQL
> server" and error code 2006(99). This error code boils down to "Cannot bind
> socket to resource", which is why we started looking into the TCP conenctions
> of our Airflow host (a single node that hosts the webserver, scheduler and
> worker).
> When the 8 tasks are simultaneously running, we observe more than 15,000
> TIME_WAIT connections while less than 50 are established. Given, that the
> number of available ports is somewhat smaller than 30,000, this large number
> of blocked but unused TCP connections would explain the failing of further
> task executions.
> Can anyone explain how these many open connections blocking ports/sockets
> come about? Given that we have connection pooling enabled, we do not see any
> explanation yet.
> Your help is very much appreciated, this issue strongly limits our current
> performance!
> Cheers
> Oliver
--
This message was sent by Atlassian Jira
(v8.20.1#820001)