caljnj opened a new issue #14068:
URL: https://github.com/apache/airflow/issues/14068
**Apache Airflow version**:1.10.9
**Environment**:
- **Cloud provider or hardware configuration**:
- **OS** (e.g. from /etc/os-release): CentOS Linux 8 (Core)
- **Kernel** (e.g. `uname -a`): 4.18.0-147.5.1.el8_1.x86_64
- **Install tools**:
- **Others**:
**What happened**:
Airflow is producing a crazy amount of DNS queries (a few hundred per min)
for requestion name resolution for database server we have specified in the
`sql_alchemy_conn` variable
via auditd the logs look like this:
```
type=PROCTITLE msg=audit(02/04/21 13:17:39.711:630677) : proctitle=airflow
scheduler -- DagFileProcessorManager
type=SOCKADDR msg=audit(02/04/21 13:17:39.711:630677) : saddr={ fam=inet
laddr=192.168.8.8 lport=53 }
type=SYSCALL msg=audit(02/04/21 13:17:39.711:630677) : arch=x86_64
syscall=connect success=yes exit=0 a0=0x1e a1=0x7ff852737934 a2=0x10
a3=0x7ffea729eef4 items=0 ppid=5388 pid=14411 auid=someuser uid=someuser
gid=someuser euid=someuser suid=someuser fsuid=someuser egid=someuser
sgid=someuser fsgid=someuser tty=(none) ses=1 comm=airflow schedul
exe=/usr/libexec/platform-python3.6
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=connect
```
where the process 5388 looked like this:
```
someuser 5388 5328 0 Jan20 ? 02:45:50 airflow scheduler --
DagFileProcessorManager
someuser 15018 5388 10 13:18 ? 00:00:00 airflow scheduler -
DagFileProcessor /home/someuser/airflow/dags/sykdomspulsen_test.py
someuser 15020 5388 4 13:18 ? 00:00:00 [airflow schedul] <defunct>
```
**What you expected to happen**:
I would appreciate some help setting up a debug environment with the same
config etc that i can use to tracing the code to see where these DNS calls are
being made...
I thought i coudl fix it by setting a static DNS record in the hosts file
but that completely broke airflow.
see below:
```
fhi.no airflow[5453]: [2021-02-04 13:46:56 +0100] [4154] [ERROR] Exception
in worker process
Feb 04 13:46:56 someserver airflow[5453]: Traceback (most recent call last):
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line
2285, in _wrap_poo
l_connect
Feb 04 13:46:56 someserver airflow[5453]: return fn()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 363,
in connect
Feb 04 13:46:56 someserver airflow[5453]: return
_ConnectionFairy._checkout(self)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 773,
in _checkout
Feb 04 13:46:56 someserver airflow[5453]: fairy =
_ConnectionRecord.checkout(pool)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 492,
in checkout
Feb 04 13:46:56 someserver airflow[5453]: rec = pool._do_get()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 139,
in _do_get
Feb 04 13:46:56 someserver airflow[5453]: self._dec_overflow()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line
69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]: exc_value,
with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178,
in raise_
Feb 04 13:46:56 someserver airflow[5453]: raise exception
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 136,
in _do_get
Feb 04 13:46:56 someserver airflow[5453]: return
self._create_connection()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 308,
in _create_conn
ection
Feb 04 13:46:56 someserver airflow[5453]: return _ConnectionRecord(self)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 437,
in __init__
Feb 04 13:46:56 someserver airflow[5453]:
self.__connect(first_connect_check=True)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 657,
in __connect
Feb 04 13:46:56 someserver airflow[5453]: pool.logger.debug("Error on
connect(): %s", e)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line
69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]: exc_value,
with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178,
in raise_
Feb 04 13:46:56 someserver airflow[5453]: raise exception
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 652,
in __connect
Feb 04 13:46:56 someserver airflow[5453]: connection =
pool._invoke_creator(self)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py",
line 114, in conn
ect
Feb 04 13:46:56 someserver airflow[5453]: return dialect.connect(*cargs,
**cparams)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line
488, in connect
Feb 04 13:46:56 someserver airflow[5453]: return
self.dbapi.connect(*cargs, **cparams)
Feb 04 13:46:56 someserver airflow[5453]: pyodbc.Error: ('HY000', '[HY000]
[Microsoft][ODBC Driver 17 for SQL Server]SSPI Provider: Server not
found in Kerberos database (851968) (SQLDriverConnect)')
Feb 04 13:46:56 someserver airflow[5453]: The above exception was the direct
cause of the following exception:
Feb 04 13:46:56 someserver airflow[5453]: Traceback (most recent call last):
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/gunicorn/arbiter.py", line
586, in spawn_w
orker
Feb 04 13:46:56 someserver airflow[5453]: worker.init_process()
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/gunicorn/workers/base.py",
line 135, in in
it_process
Feb 04 13:46:56 someserver airflow[5453]: self.load_wsgi()
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/gunicorn/workers/base.py",
line 144, in lo
ad_wsgi
Feb 04 13:46:56 someserver airflow[5453]: self.wsgi = self.app.wsgi()
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/gunicorn/app/base.py", line
67, in wsgi
Feb 04 13:46:56 someserver airflow[5453]: self.callable = self.load()
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py",
line 52, in load
Feb 04 13:46:56 someserver airflow[5453]: return self.load_wsgiapp()
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py",
line 41, in load
_wsgiapp
Feb 04 13:46:56 someserver airflow[5453]: return
util.import_app(self.app_uri)
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/gunicorn/util.py", line 361,
in import_app
Feb 04 13:46:56 someserver airflow[5453]: app = eval(obj, vars(mod))
Feb 04 13:46:56 someserver airflow[5453]: File "<string>", line 1, in
<module>
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/airflow/www/app.py", line
233, in cached_a
pp
Feb 04 13:46:56 someserver airflow[5453]: app = create_app(config,
testing)
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/airflow/www/app.py", line
88, in create_ap
p
Feb 04 13:46:56 someserver airflow[5453]: from airflow.www import views
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/airflow/www/views.py", line
2443, in <modu
le>
Feb 04 13:46:56 someserver airflow[5453]: class
ChartModelView(wwwutils.DataProfilingMixin, AirflowModelView):
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/airflow/www/views.py", line
2534, in Chart
ModelView
Feb 04 13:46:56 someserver airflow[5453]: 'conn_id': _connection_ids()
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/airflow/utils/db.py", line
74, in wrapper
Feb 04 13:46:56 someserver airflow[5453]: return func(*args, **kwargs)
Feb 04 13:46:56 someserver airflow[5453]: File
"/home/someuser/.local/lib/python3.6/site-packages/airflow/www/views.py", line
2440, in _conn
ection_ids
Feb 04 13:46:56 someserver airflow[5453]: .group_by(Connection.conn_id))]
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3403,
in __iter__
Feb 04 13:46:56 someserver airflow[5453]: return
self._execute_and_instances(context)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3425,
in _execute_an
d_instances
Feb 04 13:46:56 someserver airflow[5453]: querycontext,
self._connection_from_session, close_with_result=True
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3440,
in _get_bind_a
rgs
Feb 04 13:46:56 someserver airflow[5453]: mapper=self._bind_mapper(),
clause=querycontext.statement, **kw
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3418,
in _connection
_from_session
Feb 04 13:46:56 someserver airflow[5453]: conn =
self.session.connection(**kw)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line
1133, in connectio
n
Feb 04 13:46:56 someserver airflow[5453]:
execution_options=execution_options,
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line
1139, in _connecti
on_for_bind
Feb 04 13:46:56 someserver airflow[5453]: engine, execution_options
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 432,
in _connectio
n_for_bind
Feb 04 13:46:56 someserver airflow[5453]: conn =
bind._contextual_connect()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line
2251, in _contextu
al_connect
Feb 04 13:46:56 someserver airflow[5453]:
self._wrap_pool_connect(self.pool.connect, None),
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line
2289, in _wrap_poo
l_connect
Feb 04 13:46:56 someserver airflow[5453]: e, dialect, self
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line
1555, in _handle_d
bapi_exception_noconnection
Feb 04 13:46:56 someserver airflow[5453]: sqlalchemy_exception,
with_traceback=exc_info[2], from_=e
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178,
in raise_
Feb 04 13:46:56 someserver airflow[5453]: raise exception
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line
2285, in _wrap_poo
l_connect
Feb 04 13:46:56 someserver airflow[5453]: return fn()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 363,
in connect
Feb 04 13:46:56 someserver airflow[5453]: return
_ConnectionFairy._checkout(self)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 773,
in _checkout
Feb 04 13:46:56 someserver airflow[5453]: fairy =
_ConnectionRecord.checkout(pool)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 492,
in checkout
Feb 04 13:46:56 someserver airflow[5453]: rec = pool._do_get()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 139,
in _do_get
Feb 04 13:46:56 someserver airflow[5453]: self._dec_overflow()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line
69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]: exc_value,
with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178,
in raise_
Feb 04 13:46:56 someserver airflow[5453]: raise exception
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py", line 136,
in _do_get
Feb 04 13:46:56 someserver airflow[5453]: return
self._create_connection()
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 308,
in _create_conn
ection
Feb 04 13:46:56 someserver airflow[5453]: return _ConnectionRecord(self)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 437,
in __init__
Feb 04 13:46:56 someserver airflow[5453]:
self.__connect(first_connect_check=True)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 657,
in __connect
Feb 04 13:46:56 someserver airflow[5453]: pool.logger.debug("Error on
connect(): %s", e)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line
69, in __exit
__
Feb 04 13:46:56 someserver airflow[5453]: exc_value,
with_traceback=exc_tb,
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178,
in raise_
Feb 04 13:46:56 someserver airflow[5453]: raise exception
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/pool/base.py", line 652,
in __connect
Feb 04 13:46:56 someserver airflow[5453]: connection =
pool._invoke_creator(self)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py",
line 114, in conn
ect
Feb 04 13:46:56 someserver airflow[5453]: return dialect.connect(*cargs,
**cparams)
Feb 04 13:46:56 someserver airflow[5453]: File
"/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line
488, in connect
Feb 04 13:46:56 someserver airflow[5453]: return
self.dbapi.connect(*cargs, **cparams)
Feb 04 13:46:56 someserver airflow[5453]: sqlalchemy.exc.DBAPIError:
(pyodbc.Error) ('HY000', '[HY000] [Microsoft][ODBC Driver 17 for SQL Serv
er]SSPI Provider: Server not found in Kerberos database (851968)
(SQLDriverConnect)')
Feb 04 13:46:56 someserver airflow[5453]: (Background on this error at:
http://sqlalche.me/e/dbapi)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [4154]
[INFO] Worker exiting (pid: 4154)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [3569]
[INFO] Worker exiting (pid: 3569)[2021-02-04 13:46:56 +0100] [395
0] [INFO] Worker exiting (pid: 3950)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [3192]
[INFO] Worker exiting (pid: 3192)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [2804]
[INFO] Worker exiting (pid: 2804)
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [5481]
[INFO] Shutting down: Master
Feb 04 13:46:56 someserver airflow[5453]: [2021-02-04 13:46:56 +0100] [5481]
[INFO] Reason: Worker failed to boot.
Feb 04 13:49:26 someserver airflow[5453]: [2021-02-04 13:49:26,737]
{cli.py:859} ERROR - No response from gunicorn master within 120 seconds
Feb 04 13:49:26 someserver airflow[5453]: [2021-02-04 13:49:26,811]
{cli.py:860} ERROR - Shutting down webserver
Feb 04 13:49:29 someserver systemd[2859]: airflow-webserver.service: Main
process exited, code=exited, status=1/FAILURE
Feb 04 13:49:29 someserver systemd[2859]: airflow-webserver.service: Failed
with result 'exit-code'.
Feb 04 13:49:34 someserver systemd[2859]: airflow-webserver.service: Service
RestartSec=5s expired, scheduling restart.
Feb 04 13:49:34 someserver systemd[2859]: airflow-webserver.service:
Scheduled restart job, restart counter is at 1.
Feb 04 13:49:34 someserver systemd[2859]: Stopped Airflow webserver daemon.
Feb 04 13:49:34 someserver systemd[2859]: Started Airflow webserver daemon.
Feb 04 13:49:34 someserver airflow[4812]: [2021-02-04 13:49:34,842]
{settings.py:253} INFO - settings.configure_orm(): Using pool settings. po
ol_size=5, max_overflow=10, pool_recycle=1800, pid=4812
Feb 04 13:49:35 someserver airflow[4812]: [2021-02-04 13:49:35,683]
{cli_action_loggers.py:107} WARNING - Failed to log action with (pyodbc.Er
ror) ('HY000', '[HY000] [Microsoft][ODBC Driver 17 for SQL Server]SSPI
Provider: Server not found in Kerberos database (851968) (SQLDriverConnect)')
Feb 04 13:49:35 someserver airflow[4812]: (Background on this error at:
http://sqlalche.me/e/dbapi)
```
**How to reproduce it**:
I'm not sure how to repro this to be honest. i'm new to airflow and new to
the system im working on.
But more than happy to get some pointers on how to breakpoint the code
myself to see where the DNS calls are happening and find out what kind of loop
i'm in.
A wild guess is there is something wrong with the timers in the
`setup_event_handlers` function..
https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/utils/sqlalchemy.html
**Anything else we need to know**:
This problem occurs all day.
It seems to happen within the processes that are forked by the scheduler.
Any relevant logs to include? Put them here in side a detail tag:
<details><summary>x.log</summary> lots of stuff </details>
-->
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]