Re: Scheduler silently dies

Bolke de Bruin Sat, 25 Mar 2017 18:17:59 -0700

Hi Harish,

The below does *not* indicate a scheduler hang, it is a valid exception as 
mentioned earlier.


Bolke.

> On 24 Mar 2017, at 19:07, harish singh <harish.sing...@gmail.com> wrote:
> 
> We have been using (1.7) over a year and never faced this issue.
> The moment we switched to 1.8, I think we have hit this issue.
> The reason why I saw "I think" is because I am not sure if it is the same
> issue. But whenever I restart, my pipeline proceeds.
> 
> 
> 
> *Airflow 1.7Having said that, In 1.7, I did face a similar issue (less than
> 5 times over a year): *
> *I saw that there were lot of processes marked  "<defunct>"  with parent
> process being "scheduler". *
> 
> *Somebody mentioned it in this jira ->
> https://issues.apache.org/jira/browse/AIRFLOW-401
> <https://issues.apache.org/jira/browse/AIRFLOW-401>*
> *Workaround:  Restart scheduler*
> 
> 
> 
> 
> *Airflow 1.8:Now the issue in 1.8 may be different then the issue in
> 1.7 But again the issue get solved and pipeline progresses on a SCHEDULER
> RESTART.*If it may help, this is the trace in 1.8:
> [2017-03-22 19:35:16,332] {models.py:167} INFO - Filling up the DagBag from
> /usr/local/airflow/pipeline/pipeline.py [2017-03-22 19:35:22,451]
> {airflow_configuration.py:40} INFO - loading setup.cfg file [2017-03-22
> 19:35:51,041] {timeout.py:37} ERROR - Process timed out [2017-03-22
> 19:35:51,041] {models.py:266} ERROR - Failed to import:
> /usr/local/airflow/pipeline/pipeline.py Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 263,
> in process_file m = imp.load_source(mod_name, filepath) File
> "/usr/local/airflow/pipeline/pipeline.py", line 167, in <module>
> create_tasks(dbguid, version, dag, override_start_date) File
> "/usr/local/airflow/pipeline/pipeline.py", line 104, in create_tasks t =
> create_task(dbguid, dag, taskInfo, version, override_date) File
> "/usr/local/airflow/pipeline/pipeline.py", line 85, in create_task retries,
> 1, depends_on_past, version, override_dag_date) File
> "/usr/local/airflow/pipeline/dags/base_pipeline.py", line 90, in
> create_python_operator depends_on_past=depends_on_past) File
> "/usr/local/lib/python2.7/dist-packages/airflow/utils/decorators.py", line
> 86, in wrapper result = func(*args, **kwargs) File
> "/usr/local/lib/python2.7/dist-packages/airflow/operators/python_operator.py",
> line 65, in __init__ super(PythonOperator, self).__init__(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/airflow/utils/decorators.py",
> line 70, in wrapper sig = signature(func) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 105, in signature return
> Signature.from_function(obj) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 594, in from_function
> __validate_parameters__=False) File "/usr/local/lib/python2.7/
> dist-packages/funcsigs/__init__.py", line 518, in __init__ for param in
> parameters)) File "/usr/lib/python2.7/collections.py", line 52, in __init__
> self.__update(*args, **kwds) File "/usr/lib/python2.7/_abcoll.py", line
> 548, in update self[key] = value File "/usr/lib/python2.7/collections.py",
> line 61, in __setitem__ last[1] = root[0] = self.__map[key] = [last, root,
> key] File "/usr/local/lib/python2.7/dist-packages/airflow/utils/timeout.py",
> line 38, in handle_timeout raise AirflowTaskTimeout(self.error_message)
> AirflowTaskTimeout: Timeout
> 
> 
> 
> 
> On Fri, Mar 24, 2017 at 5:45 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
>> We are running *without* num runs for over a year (and never have). It is
>> a very elusive issue which has not been reproducible.
>> 
>> I like more info on this but it needs to be very elaborate even to the
>> point of access to the system exposing the behavior.
>> 
>> Bolke
>> 
>> Sent from my iPhone
>> 
>>> On 24 Mar 2017, at 16:04, Vijay Ramesh <vi...@change.org> wrote:
>>> 
>>> We literally have a cron job that restarts the scheduler every 30 min.
>> Num
>>> runs didn't work consistently in rc4, sometimes it would restart itself
>> and
>>> sometimes we'd end up with a few zombie scheduler processes and things
>>> would get stuck. Also running locally, without celery.
>>> 
>>>> On Mar 24, 2017 16:02, <lro...@quartethealth.com> wrote:
>>>> 
>>>> We have max runs set and still hit this. Our solution is dumber:
>>>> monitoring log output, and kill the scheduler if it stops emitting.
>> Works
>>>> like a charm.
>>>> 
>>>>> On Mar 24, 2017, at 5:50 PM, F. Hakan Koklu <fhakan.ko...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Some solutions to this problem is restarting the scheduler frequently
>> or
>>>>> some sort of monitoring on the scheduler. We have set up a dag that
>> pings
>>>>> cronitor <https://cronitor.io/> (a dead man's snitch type of service)
>>>> every
>>>>> 10 minutes and the snitch pages you when the scheduler dies and does
>> not
>>>>> send a ping to it.
>>>>> 
>>>>> On Fri, Mar 24, 2017 at 1:49 PM, Andrew Phillips <
>> aphill...@qrmedia.com>
>>>>> wrote:
>>>>> 
>>>>>> We use celery and run into it from time to time.
>>>>>>> 
>>>>>> 
>>>>>> Bang goes my theory ;-) At least, assuming it's the same underlying
>>>>>> cause...
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> ap
>>>>>> 
>>>> 
>>

Re: Scheduler silently dies

Reply via email to