We dont handle this kind of errors in airflow, so it becomes a hard error and 
airflow bails out. 

You are running out of memory most likely as some other process is taking up 
all remaining. Are you running workers on the same machine? These will go up 
and down with mem usage over time depending the jobs you launch. 

This is not related to "restarting the scheduler" (which is kind of outdated 
anyway). 

Bolke

Sent from my iPhone

> On 26 Dec 2016, at 21:47, Teresa Fontanella De Santis 
> <[email protected]> wrote:
> 
> Hi everyone!
> 
> We were running the scheduler without problems for a while. We are using
> ec2 instance (mx4.large). We were running with airflow scheduler (no
> supervisor.d, no monit, etc).
> Suddenly, the scheduler stopped, showing this message:
> 
> [2016-12-22 21:01:15,038] {jobs.py:574} INFO - Prioritizing 1 queued
> jobs
> [742/1767]
> [2016-12-22 21:01:15,041] {jobs.py:603} INFO - Pool None has 128 slots, 1
> task instances in
> queue
> 
> [2016-12-22 21:01:15,041] {models.py:154} INFO - Filling up the DagBag from
> /home/ec2-user/analytics/airflow/dags
> 
> [2016-12-22 21:01:15,155] {jobs.py:726} INFO - Starting 2 scheduler
> jobs
> 
> [2016-12-22 21:01:15,157] {jobs.py:761} ERROR - [Errno 12] Cannot allocate
> memory
> 
> Traceback (most recent call
> last):
> 
>  File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line 728,
> in
> _execute
> 
> 
> j.start()
> 
>  File "/usr/lib64/python3.5/multiprocessing/process.py", line 105, in
> start
> 
>    self._popen =
> self._Popen(self)
> 
>  File "/usr/lib64/python3.5/multiprocessing/context.py", line 212, in
> _Popen
> 
>    return
> _default_context.get_context().Process._Popen(process_obj)
> 
>  File "/usr/lib64/python3.5/multiprocessing/context.py", line 267, in
> _Popen
> 
>    return
> Popen(process_obj)
> 
>  File "/usr/lib64/python3.5/multiprocessing/popen_fork.py", line 20, in
> __init__
> 
> 
> self._launch(process_obj)
> 
>  File "/usr/lib64/python3.5/multiprocessing/popen_fork.py", line 67, in
> _launch
> 
>    self.pid =
> os.fork()
> 
> OSError: [Errno 12] Cannot allocate
> memory
> 
> Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 15, in <module>
> 
> 
> The dags which failed didn't show any log (there weren't stored on airflow
> instance and there is no remote logs). So we don't have any idea of what
> would happened (only that there was not enough memory to fork)
> It is well known that is recommended to restart the scheduler periodically
> (according to this
> <https://medium.com/handy-tech/airflow-tips-tricks-and-pitfalls-9ba53fba14eb#.80c6g1n1s>),
> but... do you have any idea why this can happen? Is there something we can
> do (or some bug we can fix)?
> 
> 
> Thanks in advance!

Reply via email to