Alex Did you ever have success with using _exit() instead of exit()?
Graham On 15/05/2014, at 10:15 AM, Graham Dumpleton <[email protected]> wrote: > That would certainly cause execution of destructors for global C++ objects to > be skipped. It will though also skip atexit() callbacks and possibly things > to do with flushing C FILE objects. > > Skipping those other things may not matter, so it may work as an interim > solution until can see whether proper destruction of memory pools on process > shutdown will avoid issue. > > Graham > > On 15/05/2014, at 5:20 AM, Alex Wu <[email protected]> wrote: > >> Got a suggestion from mod_pagespeed project: call _exit instead of exit. >> I'll test it out to see if the segmentation fault would be gone. >> >> Alex >> >> On Monday, May 12, 2014 6:28:28 PM UTC-7, Graham Dumpleton wrote: >> >> On 13/05/2014, at 11:13 AM, Alex Wu <[email protected]> wrote: >> >>> My question is that if mod_wsgi should wipe out all meomry inherited from >>> parent once it forks? >> >> It can't. It relies on it being a fork (and not a fork/exec) to inherit >> everything. >> >>> I am not clear if a module inherits an C++ object from parent, does it >>> trigger a destructor call? >> >> Most likely it would. >> >> What I don't know is if you unload a module does that by pass execution of >> finaliser sections. >> >> I would imagine it cannot by pass them else memory from the heap would not >> be released otherwise, if referenced by global C++ objects, and you would >> get a potential memory leak. >> >> This may not matter on process shutdown, but would during an Apache restart >> as Apache will unload and reload modules when that occurs. >> >> So although in the Apache parent it does appear to unload modules on process >> shutdown: >> >> /* >> * Register a cleanup in the config apr_pool_t (normally pconf). When >> * we do a restart (or shutdown) this cleanup will cause the >> * shared object to be unloaded. >> */ >> apr_pool_cleanup_register(cmd->pool, modi, unload_module, >> apr_pool_cleanup_null); >> >> >> int main(...) { >> ... >> >> destroy_and_exit_process(process, 0); >> >> return 0; /* Termination 'ok' */ >> } >> >> static void destroy_and_exit_process(process_rec *process, >> int process_exit_value) >> { >> /* >> * Sleep for TASK_SWITCH_SLEEP micro seconds to cause a task switch on >> * OS layer and thus give possibly started piped loggers a chance to >> * process their input. Otherwise it is possible that they get killed >> * by us before they can do so. In this case maybe valueable log messages >> * might get lost. >> */ >> apr_sleep(TASK_SWITCH_SLEEP); >> apr_pool_destroy(process->pool); /* and destroy all descendent pools */ >> apr_terminate(); >> exit(process_exit_value); >> } >> >> doing that may not help and may just trigger it at that point instead. >> >> I will though need to look into whether I should introduce something similar >> just prior to calling exit() in the daemon processes. >> >> I would have to be very careful about what pools I destroy though. Or >> perhaps work out how just to trigger cleanup routines on selected pools. >> >> Graham >> >>> Alex >>> >>> On Monday, May 12, 2014 5:42:27 PM UTC-7, Graham Dumpleton wrote: >>> Okay. So this isn't an atexit() callback but global C++ object destructors >>> kicking in from the automatic execution of finaliser sections on the object >>> files. >>> >>> Same issue applies though in part. It looks like the page speed module >>> could be making some assumption that certain data will always be >>> initialised by the time the process is terminated, but possibly because >>> Apache module child init handlers are not called for the page speed module >>> in the mod_wsgi daemon processes, then that data isn't initialised and as a >>> result it crashes. >>> >>> When this happens though it is usual to see a NULL pointer dereference or >>> low memory access due to relative reference to NULL pointer. I can't see an >>> obvious case of that, but is hard to tell what the module is doing. >>> >>> Another problem with this thought is that since the page speed module >>> doesn't get to do anything at all in the mod_wsgi daemon mode process, then >>> can't see how this issue wouldn't also arise in the Apache parent process >>> unless the fact that the module might be unloaded from memory by Apache >>> first before shutdown (can't remember) might mean that global C++ >>> destructors aren't called in that case. >>> >>> Now one could argue that if this is happening that the page speed module is >>> being sloppy, but at the same time, under normal circumstances an Apache >>> module would never need to contend with possibility that something like the >>> Apache child init handler might not be called in a child process. That is >>> an oddity caused by mod_wsgi daemon mode. >>> >>> Anyway, all can do right now is confirm whether it is the page speed module >>> by disabling that module temporarily. >>> >>> Will then need to work out what to do and perhaps raise issue with page >>> speed module authors if that is where it is arising and see if they want to >>> say not their problem since mod_wsgi does weird stuff. :-) >>> >>> Graham >>> >>> On 13/05/2014, at 9:51 AM, Alex Wu <[email protected]> wrote: >>> >>>> Here is one example: >>>> >>>> warning: Can't read pathname for load map: Input/output error. >>>> [Thread debugging using libthread_db enabled] >>>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". >>>> Core was generated by `(wsgi:dataplane) -D Dataplane -D >>>> pagespeed -D fwd_proxy -D DAT'. >>>> Program terminated with signal 11, Segmentation fault. >>>> #0 0x00007fb46da18e4a in ?? () from /usr/lib/libpython2.7.so.1.0 >>>> (gdb) info threads >>>> Id Target Id Frame >>>> 5 Thread 0x7fb458fe9700 (LWP 25847) 0x00007fb4777566e0 in sigprocmask >>>> () from /lib/x86_64-linux-gnu/libc.so.6 >>>> 4 Thread 0x7fb4785bb740 (LWP 22886) 0x00007fb46cadd678 in (anonymous >>>> namespace)::scribble (ptr=0x7fb478f13a38, size=34008, >>>> scribble_word=-559038737) >>>> at pagespeed/kernel/base/mem_debug.cc:81 >>>> 3 Thread 0x7fb469870700 (LWP 22895) 0x00007fb477814a93 in epoll_wait >>>> () from /lib/x86_64-linux-gnu/libc.so.6 >>>> 2 Thread 0x7fb46aa76700 (LWP 22893) 0x00007fb47780d763 in select () >>>> from /lib/x86_64-linux-gnu/libc.so.6 >>>> * 1 Thread 0x7fb46a071700 (LWP 22894) 0x00007fb46da18e4a in ?? () from >>>> /usr/lib/libpython2.7.so.1.0 >>>> (gdb) thread 4 >>>> [Switching to thread 4 (Thread 0x7fb4785bb740 (LWP 22886))] >>>> #0 0x00007fb46cadd678 in (anonymous namespace)::scribble >>>> (ptr=0x7fb478f13a38, size=34008, scribble_word=-559038737) at >>>> pagespeed/kernel/base/mem_debug.cc:81 >>>> 81 pagespeed/kernel/base/mem_debug.cc: No such file or directory. >>>> (gdb) bt >>>> #0 0x00007fb46cadd678 in (anonymous namespace)::scribble >>>> (ptr=0x7fb478f13a38, size=34008, scribble_word=-559038737) at >>>> pagespeed/kernel/base/mem_debug.cc:81 >>>> #1 0x00007fb46cadd827 in (anonymous namespace)::debug_free >>>> (ptr=0x7fb478f13a38) at pagespeed/kernel/base/mem_debug.cc:100 >>>> #2 0x00007fb46cadd9f9 in operator delete[] (ptr=0x7fb478f13a38) at >>>> pagespeed/kernel/base/mem_debug.cc:142 >>>> #3 0x00007fb46ce2256e in re2::Prog::~Prog (this=0x7fb478c260e8, >>>> __in_chrg=<optimized out>) at third_party/re2/src/re2/prog.cc:123 >>>> #4 0x00007fb46cdf5402 in re2::RE2::~RE2 (this=0x7fb478ff3dd8, >>>> __in_chrg=<optimized out>) at third_party/re2/src/re2/re2.cc:272 >>>> #5 0x00007fb46d1033af in >>>> pagespeed::js::JsTokenizerPatterns::~JsTokenizerPatterns >>>> (this=0x7fb478ff3dd8, __in_chrg=<optimized out>) >>>> at pagespeed/kernel/js/js_tokenizer.cc:1096 >>>> #6 0x00007fb46cf9f00c in >>>> base::DefaultDeleter<pagespeed::js::JsTokenizerPatterns>::operator() >>>> (this=0x7fb46d6a6fe8, ptr=0x7fb478ff3dd8) >>>> at third_party/chromium/src/base/memory/scoped_ptr.h:137 >>>> #7 0x00007fb46cf9efc2 in >>>> base::internal::scoped_ptr_impl<pagespeed::js::JsTokenizerPatterns, >>>> base::DefaultDeleter<pagespeed::js::JsTokenizerPatterns> >>>> >::~scoped_ptr_impl >>>> (this=0x7fb46d6a6fe8, __in_chrg=<optimized out>) at >>>> third_party/chromium/src/base/memory/scoped_ptr.h:220 >>>> #8 0x00007fb46cf9ef6c in scoped_ptr<pagespeed::js::JsTokenizerPatterns, >>>> base::DefaultDeleter<pagespeed::js::JsTokenizerPatterns> >::~scoped_ptr >>>> (this=0x7fb46d6a6fe8, >>>> __in_chrg=<optimized out>) at >>>> third_party/chromium/src/base/memory/scoped_ptr.h:310 >>>> #9 0x00007fb46cf9ef33 in net_instaweb::ProcessContext::~ProcessContext >>>> (this=0x7fb46d6a6fe8, __in_chrg=<optimized out>) at >>>> net/instaweb/rewriter/process_context.cc:54 >>>> #10 0x00007fb46cad3969 in net_instaweb::(anonymous >>>> namespace)::ApacheProcessContext::~ApacheProcessContext >>>> (this=0x7fb46d6a6fe0, __in_chrg=<optimized out>) >>>> at net/instaweb/apache/mod_instaweb.cc:313 >>>> #11 0x00007fb47775b901 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >>>> #12 0x00007fb47775b985 in exit () from /lib/x86_64-linux-gnu/libc.so.6 >>>> #13 0x00007fb46dddfd96 in wsgi_start_process (p=<optimized out>, >>>> daemon=<optimized out>) at mod_wsgi.c:11969 >>>> #14 0x00007fb46dde1344 in wsgi_start_daemons (p=0x7fb478bac138) at >>>> mod_wsgi.c:12166 >>>> #15 wsgi_hook_init (pconf=0x7fb478bac138, ptemp=<optimized out>, >>>> plog=<optimized out>, s=<optimized out>) at mod_wsgi.c:13737 >>>> #16 0x00007fb478633113 in ap_run_post_config (pconf=0x7fb478bac138, >>>> plog=0x7fb478bd9378, ptemp=0x7fb478bd7348, s=0x7fb478bd5538) at >>>> config.c:106 >>>> #17 0x00007fb478608993 in main (argc=15, argv=0x7fff2ee8cfd8) at main.c:765 >>>> On Monday, May 12, 2014 4:07:35 PM UTC-7, Graham Dumpleton wrote: >>>> Can you point out to me where in the Apache 2.4 code base it calls >>>> atexit() to register anything on process shutdown? >>>> >>>> Neither Apache nor the underlying APR/APU libraries that it uses rely on >>>> atexit() to have anything triggered on process shutdown that I know of and >>>> I cannot find anything in the code I have handy for those which uses >>>> atexit() in such a generic way. >>>> >>>> Normally Apache relies on cleanup actions attached to deletion of memory >>>> pools and not atexit(). Thus it requires orderly Apache process shutdown >>>> and for memory pools to be destroyed for actions to be performed on >>>> process shutdown. The destruction of memory pools is not triggered via >>>> atexit(). >>>> >>>> Do you also have a more extensive stack trace that that one line so I can >>>> see in what actual code the crash occurs? That may give me more clues. >>>> >>>> Graham >>>> >>>> On 13/05/2014, at 8:58 AM, Alex Wu <[email protected]> wrote: >>>> >>>>> we do not specifically add hook to atexit. It is called/triggered by >>>>> apache frame work when a module is written within the apache 2.4 frame >>>>> work. Also, mod_pagespeed used scoped point on their server context, it >>>>> triggers auto clean once exit is called and library is unloaded. >>>>> >>>>> Alex >>>>> >>>>> >>>>> >>>>> On Monday, May 12, 2014 3:40:26 PM UTC-7, Graham Dumpleton wrote: >>>>> If your own Apache modules are using atexit() to perform cleanup on >>>>> process exit, rather than Apache's own mechanisms for performing cleanup >>>>> actions when the pool the module uses is cleaned up, then the atexit() >>>>> callback will have to take into consideration that under mod_wsgi when >>>>> using daemon mode, that the Apache module child init handler will not be >>>>> called in the daemon process for your Apache module. Thus the callback >>>>> should check whether global data pointers are in fact non NULL before >>>>> trying to do things with them. >>>>> >>>>> Can you confirm you are using atexit() callbacks in C code with your >>>>> Apache modules and explain at what point you are registering the callback >>>>> with atexit()? >>>>> >>>>> Is there a specific reason you are using atexit() callbacks rather than >>>>> doing the normal thing of in the Apache module child init handler >>>>> registering a cleanup callback on the memory pool given to the Apache >>>>> module on child init and relying on that being triggered by Apache when >>>>> shutting things down? >>>>> >>>>> Graham >>>>> >>>>> On 13/05/2014, at 8:23 AM, Alex Wu <[email protected]> wrote: >>>>> >>>>>> some are our own, one is mod_pagespeed. We use python 2.7.3 with apache >>>>>> 2.4.7 in MPM mode. The segmentation fault is cleanup routine of each >>>>>> modules other than mod_wsgi after exit call. >>>>>> >>>>>> Alex >>>>>> >>>>>> >>>>>> On Monday, May 12, 2014 1:50:35 PM UTC-7, Graham Dumpleton wrote: >>>>>> On 13/05/2014, at 4:40 AM, Alex Wu <[email protected]> wrote: >>>>>> >>>>>> > We have observed various segmentation fault caused by exit call from >>>>>> > mod_wsgi 3.5: >>>>>> > >>>>>> > #20 0x00007f9490a94d96 in wsgi_start_process (p=<optimized out>, >>>>>> > daemon=<optimized out>) at mod_wsgi.c:11969 >>>>>> > >>>>>> > The exit call triggers cleanup from other modules, that cleanup caused >>>>>> > segmentation fault, >>>>>> >>>>>> What version of Apache and Python are you using? >>>>>> >>>>>> What other non standard Apache modules are you using? For example, is >>>>>> PHP being used in the same Apache instance? >>>>>> >>>>>> Graham >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "modwsgi" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at http://groups.google.com/group/modwsgi. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "modwsgi" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/modwsgi. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups >>>> "modwsgi" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an >>>> email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/modwsgi. >>>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "modwsgi" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/modwsgi. >>> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/modwsgi. >> For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
