Reaping it basically means calling "is_alive()" followed by a "wait()", then restarting it. In this case, it's going to aggrevate the situation even more, because the OOM condition will continue to last for much longer, potentially not even allowing people to log in on a shell. It's a possibility, but some thinking has to be done on the mechanism under which restarts take place.
Reducing parallellism could work, but it depends on the number of memory intense tasks. There's a pool in place (sensor_pool) which you could/should reduce to prevent this from happening. How to deal with this thing effectively will provoke interesting discussion. I like to simple approach and think that pools should be tactically used to prevent this from happening and continuous monitoring / health checking in place on schedulers and workers to see how they are doing. Both cgroups and restarts cause potentially undesirable side effects and don't 100% solve the original problem. G> On Mon, Mar 27, 2017 at 11:44 PM, Bolke de Bruin <bdbr...@gmail.com> wrote: > Resource issues (like OOM) make sense as they are really hard to recover > from. In this case I assume you are probably running heavy lifting (memory > intensive) jobs on your machine. Reducing the parallelism parameter (if I > remember correctly) will probably help you or increasing the memory > available to your airflow machine. > > For a reference architecture I would always run with a celery installation > and have workers separate from the scheduler machine. > > Going forward we might be able to use cgroups for the local executor, but > I'm not sure if we want to do that. > > B. > > Sent from my iPhone > > > On 27 Mar 2017, at 13:37, Nicholas Hodgkinson <nik.hodgkinson@ > collectivehealth.com> wrote: > > > > Actually, something pretty interesting; it seems I'm hitting OOM: > > > > [2017-03-25 02:31:40,546] {local_executor.py:31} INFO - LocalWorker > running > > airflow run AUTOMATOR-sensor-v2 SENSOR--jira_case_close_times > > 2017-03-25T02:25:00 --local --pool sensor-pool -sd > > DAGS_FOLDER/automator-sensor.py > > Process LocalWorker-17: > > Traceback (most recent call last): > > File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in > > _bootstrap > > self.run() > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/ > executors/local_executor.py", > > line 34, in run > > subprocess.check_call(command, shell=True) > > File "/usr/lib/python2.7/subprocess.py", line 535, in check_call > > retcode = call(*popenargs, **kwargs) > > File "/usr/lib/python2.7/subprocess.py", line 522, in call > > return Popen(*popenargs, **kwargs).wait() > > File "/usr/lib/python2.7/subprocess.py", line 710, in __init__ > > errread, errwrite) > > File "/usr/lib/python2.7/subprocess.py", line 1223, in _execute_child > > self.pid = os.fork() > > OSError: [Errno 12] Cannot allocate memory > > > > I have several of these before the process dies, however in the case of > > this set the whole scheduler dies and is automatically restarted: > > > > [2017-03-25 02:31:41,080] {local_executor.py:38} ERROR - failed to > execute > > task Command 'exec bash -c 'airflow run AUTOMATOR-sensor-v2 > > SENSOR--jira_case_respons > > e_times 2017-03-25T02:25:00 --local --pool sensor-pool -sd > > DAGS_FOLDER/automator-sensor.py '' returned non-zero exit status -11: > > Traceback (most recent call last): > > File "/usr/local/bin/airflow", line 15, in <module> > > args.func(args) > > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line > > 206, in run > > dag = get_dag(args) > > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line > > 76, in get_dag > > 'dag_id could not be found: {}'.format(args.dag_id)) > > airflow.exceptions.AirflowException: dag_id could not be found: > > AUTOMATOR-sensor-v2 > > [2017-03-25 02:31:41,286] {jobs.py:726} INFO - Starting 2 scheduler jobs > > [2017-03-25 02:31:41,287] {jobs.py:761} ERROR - [Errno 12] Cannot > allocate > > memory > > Traceback (most recent call last): > > File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line > 728, > > in _execute > > j.start() > > File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in > start > > self._popen = Popen(self) > > File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in > > __init__ > > self.pid = os.fork() > > OSError: [Errno 12] Cannot allocate memory > > Traceback (most recent call last): > > File "usr/local/bin/airflow", line 15, in <module> > > args.func(args) > > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line > > 455, in scheduler > > job.run() > > File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line > 173, > > in run > > self._execute() > > File "/usr/local/lib/python2.7/dist-packages/airflow/jobs.py", line > 728, > > in _execute > > j.start() > > File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in > start > > self._popen = Popen(self) > > File "/usr/lib/python2.7/multiprocessing/forking.py", line 121, in > > __init__ > > self.pid = os.fork() > > OSError: [Errno 12] Cannot allocate memory > > OpenBLAS blas_thread_init: pthread_create: Resource temporarily > unavailable > > OpenBLAS blas_thread_init: RLIMIT_NPROC 29898 current, 29898 max > > OpenBLAS blas_thread_init: pthread_create: Resource temporarily > unavailable > > OpenBLAS blas_thread_init: RLIMIT_NPROC 29898 current, 29898 max > > Traceback (most recent call last): > > File "/usr/local/bin/airflow", line 4, in <module> > > Traceback (most recent call last): > > File "/usr/local/bin/airflow", line 4, in <module> > > from airflow import configuration > > File "/usr/local/lib/python2.7/dist-packages/airflow/__init__.py", line > > 76, in <module> > > from airflow import configuration > > File "/usr/local/lib/python2.7/dist-packages/airflow/__init__.py", line > > 76, in <module> > > from airflow import operators > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/operators/__init__.py", > > line 10, in <module> > > from airflow import operators > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/operators/__init__.py", > > line 10, in <module> > > 'IntervalCheckOperator', > > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py", > > line 94, in import_module_attrs > > 'IntervalCheckOperator', > > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py", > > line 94, in import_module_attrs > > module = imp.load_module(mod, f, filename, description) > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/ > operators/check_operator.py", > > line 6, in <module> > > module = imp.load_module(mod, f, filename, description) > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/ > operators/check_operator.py", > > line 6, in <module> > > from airflow.hooks import BaseHook > > File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/__ > init__.py", > > line 30, in <module> > > from airflow.hooks import BaseHook > > File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/__ > init__.py", > > line 30, in <module> > > _import_module_attrs(globals(), _hooks) > > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py", > > line 94, in import_module_attrs > > module = imp.load_module(mod, f, filename, description) > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/postgres_hook.py", > > line 4, in <module> > > _import_module_attrs(globals(), _hooks) > > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py", > > line 94, in import_module_attrs > > module = imp.load_module(mod, f, filename, description) > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/postgres_hook.py", > > line 4, in <module> > > from airflow.hooks.dbapi_hook import DbApiHook > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", > line > > 5, in <module> > > from airflow.hooks.dbapi_hook import DbApiHook > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", > line > > 5, in <module> > > import numpy > > File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line > > 142, in <module> > > import numpy > > File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line > > 142, in <module> > > from . import add_newdocs > > File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", > line > > 13, in <module> > > from . import add_newdocs > > File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", > line > > 13, in <module> > > from numpy.lib import add_newdoc > > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", > line > > 8, in <module> > > from numpy.lib import add_newdoc > > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", > line > > 8, in <module> > > [2017-03-25 02:31:42,314] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/Grammar.txt > > from .type_check import * > > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py", > > line 11, in <module> > > from .type_check import * > > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py", > > line 11, in <module> > > import numpy.core.numeric as _nx > > File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py", > > line 16, in <module> > > import numpy.core.numeric as _nx > > File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py", > > line 16, in <module> > > from . import multiarray > > KeyboardInterrupt > > from . import multiarray > > KeyboardInterrupt > > [2017-03-25 02:31:42,930] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > > [2017-03-25 02:31:44,741] {__init__.py:36} INFO - Using executor > > LocalExecutor > > Logging into: > > /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-name/ > 2017-03-25T02:25:00 > > [2017-03-25 02:31:44,913] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/Grammar.txt > > [2017-03-25 02:31:44,964] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > > ____________ _____________ > > ____ |__( )_________ __/__ /________ __ > > ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / > > ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / > > _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ > > [2017-03-25 02:31:45,668] {jobs.py:680} INFO - Starting the scheduler > > > > Further on there is another round like the first round of OOMs, but the > > scheduler does not die like the time before; it continues to run. I then > > get this *very* strange error: > > > > [2017-03-25 02:32:26,570] {__init__.py:36} INFO - Using executor > > LocalExecutor > > [2017-03-25 02:32:26,600] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/Grammar.txt > > [2017-03-25 02:32:26,611] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > > [2017-03-25 02:32:26,765] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > > OpenBLAS blas_thread_init: pthread_create: Resource temporarily > unavailable > > OpenBLAS blas_thread_init: RLIMIT_NPROC 29898 current, 29898 max > > Traceback (most recent call last): > > File "/usr/local/bin/airflow", line 4, in <module> > > from airflow import configuration > > File "/usr/local/lib/python2.7/dist-packages/airflow/__init__.py", line > > 76, in <module> > > from airflow import operators > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/operators/__init__.py", > > line 10, in <module> > > 'IntervalCheckOperator', > > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py", > > line 94, in import_module_attrs > > module = imp.load_module(mod, f, filename, description) > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/ > operators/check_operator.py", > > line 6, in <module> > > from airflow.hooks import BaseHook > > File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/__ > init__.py", > > line 30, in <module> > > _import_module_attrs(globals(), _hooks) > > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/helpers.py", > > line 94, in import_module_attrs > > module = imp.load_module(mod, f, filename, description) > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/postgres_hook.py", > > line 4, in <module> > > from airflow.hooks.dbapi_hook import DbApiHook > > File > > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", > line > > 5, in <module> > > import numpy > > File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line > > 142, in <module> > > from . import add_newdocs > > File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", > line > > 13, in <module> > > from numpy.lib import add_newdoc > > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", > line > > 8, in <module> > > from .type_check import * > > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/type_check.py", > > line 11, in <module> > > import numpy.core.numeric as _nx > > File "/usr/local/lib/python2.7/dist-packages/numpy/core/__init__.py", > > line 16, in <module> > > from . import multiarray > > KeyboardInterrupt > > [2017-03-25 02:32:27,770] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/Grammar.txt > > [2017-03-25 02:32:27,819] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > > [2017-03-25 02:32:27,832] {jobs.py:498} INFO - Getting list of tasks to > > skip for active runs. > > [2017-03-25 02:32:27,833] {jobs.py:514} INFO - Checking dependencies on 0 > > tasks instances, minus 0 skippable ones > > > > How this received a KeyboardInterrupt I have no idea, it's not even > running > > in a TTY. A short while later the log abruptly end with this: > > > > [2017-03-25 02:32:52,281] {jobs.py:498} INFO - Getting list of tasks to > > skip for active runs. > > [2017-03-25 02:32:52,281] {jobs.py:514} INFO - Checking dependencies on 0 > > tasks instances, minus 0 skippable ones > > [2017-03-25 02:32:52,306] {jobs.py:741} INFO - Done queuing tasks, > calling > > the executor's heartbeat > > [2017-03-25 02:32:52,306] {jobs.py:744} INFO - Loop took: 3.195292 > seconds > > [2017-03-25 02:32:52,322] {models.py:305} INFO - Finding 'running' jobs > > without a recent heartbeat > > [2017-03-25 02:32:52,322] {models.py:311} INFO - Failing jobs without > > heartbeat after 2017-03-25 02:30:37.322531 > > [2017-03-25 02:32:52,376] {local_executor.py:31} INFO - LocalWorker > running > > airflow run AUTOMATOR-sensor-v2 SENSOR--task-1 2017-03-25T02:25:00 > --local > > --pool sensor-pool -sd DAGS_FOLDER/automator-sensor.py > > [2017-03-25 02:32:52,763] {__init__.py:36} INFO - Using executor > > LocalExecutor > > [2017-03-25 02:32:52,816] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/Grammar.txt > > [2017-03-25 02:32:52,832] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > > [2017-03-25 02:32:54,705] {__init__.py:36} INFO - Using executor > > LocalExecutor > > [2017-03-25 02:32:54,756] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/Grammar.txt > > [2017-03-25 02:32:54,773] {driver.py:120} INFO - Generating grammar > tables > > from /usr/lib/python2.7/lib2to3/PatternGrammar.txt > > Logging into: > > /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-2/2017-03-25T02:25:00 > > Logging into: > > /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-3/2017-03-25T02:25:00 > > Logging into: > > /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-4/2017-03-25T02:25:00 > > Logging into: > > /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-5/2017-03-25T02:25:00 > > Logging into: > > /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-6/2017-03-25T02:25:00 > > Logging into: > > /opt/airflow/logs/AUTOMATOR-sensor-v2/SENSOR--task-1/2017-03-25T02:25:00 > > > > And that's the last log I have before restarting it. > > > > Not sure if this is at all helpful, > > -N > > nik.hodgkin...@collectivehealth.com > > > > > > On Mon, Mar 27, 2017 at 12:40 PM, Gerard Toonstra <gtoons...@gmail.com> > > wrote: > > > >> Any more info from grepping that log file? > >> > >> G> > >> > >> On Mon, Mar 27, 2017 at 9:26 PM, Nicholas Hodgkinson < > >> nik.hodgkin...@collectivehealth.com> wrote: > >> > >>> from airflow.cfg: > >>> > >>> [core] > >>> ... > >>> executor = LocalExecutor > >>> parallelism = 32 > >>> dag_concurrency = 16 > >>> dags_are_paused_at_creation = True > >>> non_pooled_task_slot_count = 128 > >>> max_active_runs_per_dag = 16 > >>> ... > >>> > >>> Pretty much the defaults; I've never tweaked these values. > >>> > >>> > >>> > >>> -N > >>> nik.hodgkin...@collectivehealth.com > >>> > >>> On Mon, Mar 27, 2017 at 12:12 PM, Gerard Toonstra <gtoons...@gmail.com > > > >>> wrote: > >>> > >>>> So looks like the localworkers are dying. Airflow does not recover > from > >>>> that. > >>>> > >>>> > >>>> In SchedulerJob (jobs.py), you can see the "_execute_helper" > function. > >>>> This calls "executor.start()", which is implemented > >>>> in local_executor.py in your case. > >>>> > >>>> The LocalExecutor is thus an object owned by the SchedulerJob. This > >>>> executor creates x (parallellism) LocalWorkers, > >>>> which derive from a multiprocessing.Process class. So the processes > you > >>> see > >>>> "extra" on the scheduler are those LocalWorkers > >>>> as child processes. The LocalWorkers create additional processes > >> through > >>> a > >>>> shell ("subprocess.check_call" with (shell=True)), > >>>> which are the things doing the actual work. > >>>> > >>>> > >>>> Before that, on my 'master' here, the LocalWorker issues a * > >>>> self.logger.info > >>>> <http://self.logger.info>("{} running {}" *, which you can find in > >> the > >>>> general > >>>> output of the scheduler log file. When starting the scheduler with > >>> "airflow > >>>> scheduler", it's what gets printed on the console and starts > >>>> with "Starting the scheduler". That is the file you want to > >> investigate. > >>>> > >>>> If anything bad happens with general processing, then it prints a: > >>>> > >>>> self.logger.error("failed to execute task > >>>> {}:".format(str(e))) > >>>> > >>>> in the exception handler. I'd grep for that "failed to execute task" > in > >>> the > >>>> scheduler log file I mentioned. > >>>> > >>>> > >>>> I'm not sure where stdout/stderr go for these workers. If the call > >>>> basically succeeded, but there were issues with the queue handling, > >>>> then I'd expect this to go to stderr instead. I'm not 100% sure if > that > >>>> gets sent to the same scheduler log file or whether that goes nowhere > >>>> because of it being a child process (they're probably inherited?). > >>>> > >>>> > >>>> One further question: what's your parallellism set to? I see 22 > >> zombies > >>>> left behind. Is that your setting? > >>>> > >>>> Let us know! > >>>> > >>>> Rgds, > >>>> > >>>> Gerard > >>>> > >>>> > >>>> > >>>> On Mon, Mar 27, 2017 at 8:13 PM, harish singh < > >> harish.sing...@gmail.com> > >>>> wrote: > >>>> > >>>>> 1.8: increasing DAGBAG_IMPORT_TIMEOUT helps. I don't see the issue > >>>>> (although not sure why tasks progress has become slow? But thats not > >>> the > >>>>> issue we are discussing here. So I am ignoring that here) > >>>>> > >>>>> 1.7: our prod is running 1.7 and we havent seen the "defunct > >> process" > >>>>> issue for more than a week now. But we saw something very close to > >> what > >>>>> Nicholas provided (localexecutor, we do not use --num-runs) > >>>>> Not sure if cpu/memory limit may lead to this issue. Often when we > >> hit > >>>> this > >>>>> issue (which stalled the pipeline), we either increased the memory > >>> and/or > >>>>> moved airflow to a bulkier (cpu) instance. > >>>>> > >>>>> Sorry for a late reply. Was out of town over the weekend. > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Mar 27, 2017 at 10:47 AM, Nicholas Hodgkinson < > >>>>> nik.hodgkin...@collectivehealth.com> wrote: > >>>>> > >>>>>> 1.7.1.3, however it seems this is still an issue in 1.8 according > >> to > >>>>> other > >>>>>> posters. I'll upgrade today. > >>>>>> Yes, localexecutor. > >>>>>> Will remove -n 10 > >>>>>> > >>>>>> -N > >>>>>> nik.hodgkin...@collectivehealth.com > >>>>>> > >>>>>> > >>>>>> On Mon, Mar 27, 2017 at 10:40 AM, Bolke de Bruin < > >> bdbr...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Is this: > >>>>>>> > >>>>>>> 1. On 1.8.0? 1.7.1 is not supported anymore. > >>>>>>> 2. localexecutor? > >>>>>>> > >>>>>>> Your are running with N=10, can you try running without it? > >>>>>>> > >>>>>>> B. > >>>>>>> > >>>>>>> Sent from my iPhone > >>>>>>> > >>>>>>>> On 27 Mar 2017, at 10:28, Nicholas Hodgkinson <nik.hodgkinson@ > >>>>>>> collectivehealth.com> wrote: > >>>>>>>> > >>>>>>>> Ok, I'm not sure how helpful this is and I'm working on getting > >>>> some > >>>>>> more > >>>>>>>> information, but here's some preliminary data: > >>>>>>>> > >>>>>>>> Process tree (`ps axjf`): > >>>>>>>> 1 2391 2391 2391 ? -1 Ssl 999 0:13 > >>>>> /usr/bin/python > >>>>>>>> usr/local/bin/airflow scheduler -n 10 > >>>>>>>> 2391 2435 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2436 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2437 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2438 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2439 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2440 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2441 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2442 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2443 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2444 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2454 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2456 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2457 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2458 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2459 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2460 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2461 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2462 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2463 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2464 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2465 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> 2391 2466 2391 2391 ? -1 Z 999 0:00 \_ > >>>>>>>> [/usr/bin/python] <defunct> > >>>>>>>> > >>>>>>>> # gdb python 2391 > >>>>>>>> Reading symbols from python...Reading symbols from > >>>>>>>> /usr/lib/debug//usr/bin/python2.7...done. > >>>>>>>> done. > >>>>>>>> Attaching to program: /usr/bin/python, process 2391 > >>>>>>>> Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading > >>> symbols > >>>>>> from > >>>>>>>> /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done. > >>>>>>>> done. > >>>>>>>> Loaded symbols for /lib64/ld-linux-x86-64.so.2 > >>>>>>>> 0x00007f0c1bbb9670 in ?? () > >>>>>>>> (gdb) bt > >>>>>>>> #0 0x00007f0c1bbb9670 in ?? () > >>>>>>>> #1 0x00007f0c1bf1a000 in ?? () > >>>>>>>> #2 0x00007f0c12099b45 in ?? () > >>>>>>>> #3 0x00000000032dbe00 in ?? () > >>>>>>>> #4 0x0000000000000000 in ?? () > >>>>>>>> (gdb) py-bt > >>>>>>>> (gdb) py-list > >>>>>>>> Unable to locate python frame > >>>>>>>> > >>>>>>>> I know that's not super helpful, but it's information; I've > >> also > >>>>> tried > >>>>>>>> pyrasite, but got nothing from it of any use. This problem > >> occurs > >>>> for > >>>>>> me > >>>>>>>> very often and I'm happy to provide a modified environment in > >>> which > >>>>> to > >>>>>>>> capture info if anyone has a suggestion. For now I need to > >>> restart > >>>> my > >>>>>>>> process and get my jobs running again. > >>>>>>>> > >>>>>>>> -N > >>>>>>>> nik.hodgkin...@collectivehealth.com > >>> > >>> > >> > > > > -- > > > > > > Read our founder's story. > > <https://collectivehealth.com/blog/started-collective-health/> > > > > *This message may contain confidential, proprietary, or protected > > information. If you are not the intended recipient, you may not review, > > copy, or distribute this message. If you received this message in error, > > please notify the sender by reply email and delete this message.* >