Ethan-Henley opened a new issue #20652:
URL: https://github.com/apache/airflow/issues/20652


   ### Apache Airflow version
   
   2.1.0
   
   ### What happened
   
   Noticed today that running tasks that involve connecting to a postgres 
database via sqlalchemy inconsistently succeed or fail due to connection 
timeout, apparently at random. When they fail, get logs as below.
   
   Postgres server is running normally as per tests. Did not experience any 
such timeouts before the holiday and have only pushed one edit since, adding a 
sensor at the start of the dag where this was first noticed. 
   
   ### What you expected to happen
   
   Would expect this to consistently succeed (or consistently fail, signaling a 
clearer problem with the server or connection). 
   
   ### How to reproduce
   
   1. Set up Azure account and environment
   2. Create Azure Kubernetes Service (below)
   ```
   az aks create -g PROJECTNAME-rg -n kube-PROJECTNAME --node-vm-size 
Standard_D2_v3 --node-count 2 --generate-ssh-keys --nodepool-name system
   az aks nodepool update --resource-group PROJECTNAME-rg --cluster-name 
kube-PROJECTNAME --name system --mode System
   az aks nodepool add --resource-group PROJECTNAME-rg --cluster-name 
kube-PROJECTNAME --name computepool --node-vm-size Standard_D8_v3 --node-count 
3 --labels agentpool=app01 --mode User
   az aks get-credentials --name kube-PROJECTNAME --resource-group 
PROJECTNAME-rg
   az aks nodepool update --resource-group PROJECTNAME-rg --cluster-name 
kube-PROJECTNAME --name computepool --enable-cluster-autoscaler --min-count 1 
--max-count 3
   ```
   3. Create postgres servers for airflow and for project data
   az postgres server create --resource-group PROJECTNAME-rg --name 
PROJECTNAME-airflow  --location eastus --admin-user <USER> --admin-password 
<PASS> --sku-name GP_Gen5_2 --version 11
   az postgres server firewall-rule create --resource-group PROJECTNAME-rg 
--server-name PROJECTNAME-airflow --name AirflowRule --start-ip-address 0.0.0.0 
--end-ip-address 0.0.0.0
   az postgres server create --resource-group PROJECTNAME-rg --name PROJECTNAME 
 --location eastus --admin-user <USER> --admin-password <PASS> --sku-name 
GP_Gen5_8 --version 11
   az postgres server firewall-rule create --resource-group PROJECTNAME-rg 
--server-name PROJECTNAME --name DataRule --start-ip-address 0.0.0.0 
--end-ip-address 0.0.0.0
   az postgres server update \
       --resource-group PROJECTNAME-rg \
       --name PROJECTNAME \
       --storage-size 550000
   4. Set up DNS zone as per 
[https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/azure.md](url)
   
   ### Operating System
   
   Running this in Kubernetes on an Azure server.
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-microsoft-azure==1.0.0
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Helm version 3.0 as per 
[https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3](url), 
uses
   helm.sh/chart: airflow-2.1.0
   
   
   ### Anything else
   
   
   Log from a failed task run:
   ```
   Running <TaskInstance: DAG_NAME.TASK_NAME DATE_TIME [queued]> on host 
POD_ADDRESS
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", 
line 2336, in _wrap_pool_connect
       return fn()
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", 
line 364, in connect
       return _ConnectionFairy._checkout(self)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", 
line 778, in _checkout
       fairy = _ConnectionRecord.checkout(pool)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", 
line 495, in checkout
       rec = pool._do_get()
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", 
line 241, in _do_get
       return self._create_connection()
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", 
line 309, in _create_connection
       return _ConnectionRecord(self)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", 
line 440, in __init__
       self.__connect(first_connect_check=True)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", 
line 661, in __connect
       pool.logger.debug("Error on connect(): %s", e)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py",
 line 68, in __exit__
       compat.raise_(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", 
line 182, in raise_
       raise exception
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/pool/base.py", 
line 656, in __connect
       connection = pool._invoke_creator(self)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py",
 line 114, in connect
       return dialect.connect(*cargs, **cparams)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py",
 line 508, in connect
       return self.dbapi.connect(*cargs, **cparams)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/psycopg2/__init__.py", line 
127, in connect
       conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
   psycopg2.OperationalError: could not connect to server: Connection timed out
           Is the server running on host "POSTGRES_SERVER_ADDRESS" 
(POSTGRES_SERVER_IP) and accepting
           TCP/IP connections on port 5432?
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to