Hi everyone!
We were running the scheduler without problems for a while. We are using
ec2 instance (mx4.large). We were running with airflow scheduler (no
supervisor.d, no monit, etc).
Suddenly, the scheduler stopped, showing this message:
[2016-12-22 21:01:15,038] {jobs.py:574} INFO - Prioritizing 1 queued
jobs
[742/1767]
[2016-12-22 21:01:15,041] {jobs.py:603} INFO - Pool None has 128 slots, 1
task instances in
queue
[2016-12-22 21:01:15,041] {models.py:154} INFO - Filling up the DagBag from
/home/ec2-user/analytics/airflow/dags
[2016-12-22 21:01:15,155] {jobs.py:726} INFO - Starting 2 scheduler
jobs
[2016-12-22 21:01:15,157] {jobs.py:761} ERROR - [Errno 12] Cannot allocate
memory
Traceback (most recent call
last):
File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 728,
in
_execute
j.start()
File "/usr/lib64/python3.5/multiprocessing/process.py", line 105, in
start
self._popen =
self._Popen(self)
File "/usr/lib64/python3.5/multiprocessing/context.py", line 212, in
_Popen
return
_default_context.get_context().Process._Popen(process_obj)
File "/usr/lib64/python3.5/multiprocessing/context.py", line 267, in
_Popen
return
Popen(process_obj)
File "/usr/lib64/python3.5/multiprocessing/popen_fork.py", line 20, in
__init__
self._launch(process_obj)
File "/usr/lib64/python3.5/multiprocessing/popen_fork.py", line 67, in
_launch
self.pid =
os.fork()
OSError: [Errno 12] Cannot allocate
memory
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 15, in <module>
The dags which failed didn't show any log (there weren't stored on airflow
instance and there is no remote logs). So we don't have any idea of what
would happened (only that there was not enough memory to fork)
It is well known that is recommended to restart the scheduler periodically
(according to this
<https://medium.com/handy-tech/airflow-tips-tricks-and-pitfalls-9ba53fba14eb#.80c6g1n1s>),
but... do you have any idea why this can happen? Is there something we can
do (or some bug we can fix)?
Thanks in advance!