Hi Sergei and Naddem, can you please describe in details the solution of restart the scheduler? Do we have to check if tasks are in queue/running state before the restart?
Thanks, Hila On Thu, Sep 1, 2016 at 11:55 PM, Sergei Iakhnin <[email protected]> wrote: > Hi Bolke, > > I'm now sure who you're directing the question to. In my case it does > happen on celery. This was my original report > > https://groups.google.com/forum/#!topic/airbnb_airflow/KrB9pp5ou3c > > On Thu, Sep 1, 2016 at 10:35 PM Bolke de Bruin <[email protected]> wrote: > > > Can you confirm that this happens on celery? > > > > It awfully sounds like this: > > http://stackoverflow.com/questions/27737990/django- > celery-queue-getting-stuck > > > > > > > > Sent from my iPhone > > > > > On 1 sep. 2016, at 21:59, Sergei Iakhnin <[email protected]> wrote: > > > > > > Alexandre talked about this being a known issue at least as far back as > > 10 > > > months ago. > > > > > >> On Thu, 1 Sep 2016, 21:46 Bolke de Bruin, <[email protected]> wrote: > > >> > > >> Again please create a jira and add as much info as possible. Including > > >> debug logs, executor logs, broker logs. If possible database dump. > > >> > > >> Note airflow version, celery version, rabbitmq/redis etc. provide > config > > >> details. > > >> > > >> We really need more info to hint this down as it has been quite > elusive. > > >> And I/we have not been able to replicate it. > > >> > > >> Bolke > > >> > > >> > > >> Sent from my iPhone > > >> > > >>> On 1 sep. 2016, at 20:45, [email protected] wrote: > > >>> > > >>> > > >>> > > >>> We face exactly the same issue... > > >>> I tried to describe it here this week, > > >>> But no one had a solution. > > >>> > > >>> ב-1 בספט׳ 2016, בשעה 17:54, Sergei Iakhnin <[email protected]> > > >> כתב/ה: > > >>> > > >>>> As far as I know even Airbnb themselves restart their schedulers > every > > >> 30 > > >>>> minutes because of this issue. I ended up doing it as well with a > cron > > >> job > > >>>> after giving up hope that it would be fixed in the short term. > > >>>> > > >>>>> On Thu, 1 Sep 2016, 16:03 Charalampos Paravalos, <[email protected]> > > >> wrote: > > >>>>> > > >>>>> Hi, > > >>>>> > > >>>>> I am writting to ask for advise in an issue that I have with > airflow > > >> and > > >>>>> til now I have not managed to resolve. Wondering if someone else > had > > >>>>> something similar in the past. > > >>>>> > > >>>>> So, we use airflow to schedule DAGs that will run some jobs > > >> periodically > > >>>>> (every 30min/1hr). Jobs run as normal etc., but there are some > times > > >> that > > >>>>> suddenly after DAGs are finished, the next scheduled jobs do not > > start > > >> at > > >>>>> all. It seems like the server does not kick off the scheduled jobs > at > > >> all, > > >>>>> for any of the DAGs defined (so no jobs are running on our server). > > >> When > > >>>>> that happens I have to restart the scheduler so jobs are kicked on > > >>>>> automatically after restart. And the jobs run until this issue > > appears > > >>>>> again (I noticed it happening every 1 or 2 days, it is quite > often). > > >>>>> > > >>>>> This is very strange, tried to upgrade to 1.7.1.3 version but still > > >> that > > >>>>> issue is here. We use 32 concurrent jobs with celery workers, the > > >> server is > > >>>>> able to manage the load well. > > >>>>> > > >>>>> I believe it has to do with the scheduler, but can't understand > why. > > >>>>> Backfilled jobs maybe? Can this be? > > >>>>> > > >>>>> I am looking forward to hearing back from someone that has any > ideas. > > >>>>> Please let me know what information you might need about my setup > > >> anytime. > > >>>>> > > >>>>> Thanks for your help! > > >>>>> > > >>>>> Regards, > > >>>>> Babis > > >>>> -- > > >>>> > > >>>> Sergei > > > -- > > > > > > Sergei > > > -- > > Sergei >
