The scheduler is probably single threaded, but it's a good idea to make sure and investigate postgres (or mysql) locks:
https://wiki.postgresql.org/wiki/Lock_Monitoring On Wed, Sep 7, 2016 at 8:30 AM, Bolke de Bruin <[email protected]> wrote: > Thanks! > > Apache scrubs attachments. Can you please put it somewhere where it can be > downloaded (pastebin). > > Full logs are more helpful than just the end. If those can be downloaded > that would be nice. > > Bolke > > > > Op 7 sep. 2016, om 08:24 heeft הילה ויזן <[email protected]> het > volgende geschreven: > > > > Hi, > > we face the same issue with latest version. > > environment: > > airflow 1.7.1.3. > > postgress 9.2.13 (backend DB) > > OS Red Hat Enterprise Linux Server 7.2 (Maipo) > > python 2.7.5 > > celery version 3.1.23 > > kombu 3.0.35 > > rabbitMQ 3.3.5 > > > > airflow.config is attached > > logs of scheduler and rabbitmq are too big, i can't attach them here. > > do you want the end of the log? > > > > i'll be happy to provide more info.... > > > > > > > > > > > > > > On Wed, Sep 7, 2016 at 8:40 AM, Bolke de Bruin <[email protected] > <mailto:[email protected]>> wrote: > > 1.6.2 is quite old and many updates to the scheduler have been made. > Please make sure to use 1.7.1.3 or master. > > > > Also memory corruption requires more details as that indicates a problem > with the interpreter itself. Then you would get a core dump and a SIGSEV. > Did you get those? > > > > Bolke > > > > Sent from my iPhone > > > > > On 7 sep. 2016, at 02:45, Lance Norskog <[email protected] > <mailto:[email protected]>> wrote: > > > > > > Add your Airflow version and your Python & OS. > > > I'm on Py 2.7, Airflow 1.6.2 and have seen few different manifestions > of > > > memory corruption. > > > > > > > > >> On Sun, Sep 4, 2016 at 1:38 PM, Bolke de Bruin <[email protected] > <mailto:[email protected]>> wrote: > > >> > > >> That would be interesting, but dying - are you sure you are not > running > > >> with num_runs enabled? > > >> > > >> Yes please specify details. > > >> > > >> Verstuurd vanaf mijn iPad > > >> > > >> Op 4 sep. 2016 om 15:57 heeft Andrew Phillips <[email protected] > <mailto:[email protected]>> het > > >> volgende geschreven: > > >> > > >>>> First and foremost I am assuming that getting “stuck” is only > > >>>> happening when using a CeleryExecutor. > > >>> > > >>> We have seen repeated instanced of the scheduler "dying" - i.e. no > more > > >> scheduler threads in a ps output - with LocalExecutor too. If you > feel this > > >> fits the description of "getting stuck", happy to provide more detail > to > > >> try to get to a reproducible situation. > > >>> > > >>> Regards > > >>> > > >>> ap > > > > > > > > > > > > -- > > > Lance Norskog > > > [email protected] <mailto:[email protected]> > > > Redwood City, CA > > > >
