Re: [modwsgi] Immediate Segmentation Fault in Daemon Mode

Chingis Dugarzhapov Mon, 22 Apr 2013 10:15:01 -0700

Hi, 

Thanks for quick reply.


Now not sure if our case is related to one in the beginning of the topic. 
Segfaults were appearing on apache startup, causing crashes only for first 
request for each worker. So, if we had 

WSGIDaemonProcess my_app processes=100 threads=1
WSGIProcessGroup my_app

this would lead to 100 messages like this one in error logs:

[Tue Apr 16 12:47:53.863548 2013] [:info] [pid 24241:tid 139830256121600] 
mod_wsgi (pid=24241): Create interpreter 'my_app'.
[Tue Apr 16 12:47:53.880594 2013] [:info] [pid 24241:tid 139830256121600] 
[client 127.0.0.1:29952] mod_wsgi (pid=24241, process='my_app', 
application='my_app'): Loading WSGI script 
'/data/devsup/install/apache_pt1/htdocs/wsgi/platinum.wsgi'.
[Tue Apr 16 12:47:54.658533 2013] [core:notice] [pid 24181:tid 
139830340458304] AH00052: child pid 24200 exit signal Segmentation fault 
(11)
[Tue Apr 16 12:47:54.658915 2013] [:info] [pid 24181:tid 139830340458304] 
mod_wsgi (pid=24200): Process 'my_app' has died, restarting.
[Tue Apr 16 12:47:54.660623 2013] [:info] [pid 26439:tid 139830340458304] 
mod_wsgi (pid=26439): Starting process 'my_app' with uid=3535, gid=3535 and 
threads=1.
[Tue Apr 16 12:47:54.662395 2013] [:info] [pid 26439:tid 139830340458304] 
mod_wsgi (pid=26439): Initializing Python.
    
even if WSGI script is a hello world application.

Core dumps study with gdb (after found out how to enable them) gave this:

GNU gdb (GDB) SUSE (7.3-0.6.1)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /data/devsup/install/apache_pt2/bin/httpd...done.
Core was generated by `modpt2  -k start'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000435a3b in update_child_status_internal ()
(gdb) where
#0  0x0000000000435a3b in update_child_status_internal ()
#1  0x0000000000454eb0 in ap_start_lingering_close ()
#2  0x0000000000454f22 in ap_lingering_close ()
#3  0x00007f7d1fe970ea in wsgi_daemon_thread () from 
/data/devsup/install/apache_pt2/modules/mod_wsgi.so
#4  0x00007f7d20d6c7b6 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f7d208c49cd in clone () from /lib64/libc.so.6

Our issue has just been solved by updating to mod_wsgi 3.5 (starting from 
changeset bdbeacb88f34, "Scoreboard handle in daemon mode should be set to 
NULL for Apache 2.4 to avoid crash in lingering close."):

Apache 2.4.4
Linux 3.0.13-0.27-default x86_64
mod_wsgi 3.5-BRANCH 
Python 2.7.2 shared

Maybe this will help to others.

Cheers (and patiently waiting for official release ;) ),
Chingis


Le lundi 22 avril 2013 14:00:29 UTC+2, Graham Dumpleton a écrit :
>
> I had this drafted and ready to send when you did the followup. Will send 
> it anyway.
>
> Problem is that I cannot duplicate the problem. All works fine with Apache 
> 2.4.4 on MacOS X Mountain Lion.
>
> The previous person who had this did some analysis, but what he thought 
> was an issue shouldn't be an issue as I understood what they were 
> suggesting.
>
> They believed the problem was related to the fact that Apache will process 
> the configuration more than once. What they thought was occurring was that 
> mod_wsgi was saving away various configuration from the first pass into 
> global variables where memory for those was allocated from a memory pool 
> that Apache then destroyed after the first pass of the configuration. I 
> guess they then figured that on the second pass of the configuration that 
> it would try and access the data in those globals which would then crash 
> because the memory pool had been deleted.
>
> What was wrong with this analysis was that on the second pass of the 
> configuration, all those static global variables should have been 
> automatically reset to 0. This is because between those two configuration 
> runs, Apache would have unloaded the mod_wsgi.so object from memory and 
> then reloaded it. This should have had the affect of throwing away 
> everything mod_wsgi had done and started over.
>
> If I do a test using:
>
> $ git diff
> diff --git a/mod_wsgi.c b/mod_wsgi.c
> index 1327e37..9a8fcf9 100644
> --- a/mod_wsgi.c
> +++ b/mod_wsgi.c
> @@ -9781,6 +9781,10 @@ static const char 
> *wsgi_add_daemon_process(cmd_parms *cmd
>       * Apache configuration.
>       */
>  
> +    fprintf(stderr, "wsgi_add_daemon_process %ld\n", (long)cmd->pool);
> +    fprintf(stderr, "wsgi_daemon_list #1 %ld\n", (long)wsgi_daemon_list);
> +    fflush(stderr);
> +
>      uid = ap_unixd_config.user_id;
>      user = ap_unixd_config.user_name;
>  
> @@ -10123,6 +10127,9 @@ static const char 
> *wsgi_add_daemon_process(cmd_parms *cm
>  
>      entry->listener_fd = -1;
>  
> +    fprintf(stderr, "wsgi_daemon_list #2 %ld\n", (long)wsgi_daemon_list);
> +    fflush(stderr);
> +
>      return NULL;
>  }
>  
> @@ -13659,6 +13666,10 @@ static int wsgi_hook_init(apr_pool_t *pconf, 
> apr_pool_t
>  
>      int status = OK;
>  
> +    fprintf(stderr, "wsgi_hook_init %ld %ld\n", (long)pconf, (long)ptemp);
> +    fprintf(stderr, "wsgi_daemon_list %ld\n", (long)wsgi_daemon_list);
> +    fflush(stderr);
> +
>      /*
>       * Init function gets called twice during startup, we only
>       * need to actually do anything on the second time it is
>
> where WSGIDaemonProcess and WSGIProcessGroup appear only the once, I do 
> indeed see wsgi_add_daemon_process() called twice on each restart. For one 
> the print statement appears in the shell window where I do the restart and 
> the other in the Apache error log.
>
> In all cases though (with only one WSGIDaemonProcess), 
> the wsgi_daemon_list global is always 0 on doing the processing of that one 
> and only WSGIDaemonProcess. This indicates that mod_wsgi.so was correctly 
> unloaded and the static variables reset between the two configuration 
> passes.
>
> So right now that is not the problem.
>
> Graham
>
> On Apr 22, 2013, at 8:22 PM, Chingis Dugarzhapov 
> <[email protected]<javascript:>> 
> wrote:
>
> Hello guys,
>
> Just to confirm that issue still persists in 2.4.4, as well as in 2.4.1... 
> (mod_wsgi 3.4/3.5)
>
> The issue is quite embarrassing. Segfaults force reinitialization of all 
> mod_wsgi python environment which cause several seconds timeouts during 
> first requests. We are obliged to run a backup server during deployment in 
> production in order to make segfaults "pass though" with dummy shooters.
>
> Graham, could you kindly provide some visibility on when the issue would 
> be fixed? (we cannot go back 2.2 :( and of course Embedded mode is *not* a 
> solution for us)
>
> Cheers,
>
> --
> Chingis
>
>
> Le jeudi 28 mars 2013 06:18:09 UTC+1, Graham Dumpleton a écrit :
>>
>> There has been one other report of this with Apache 2.4.2.
>>
>> https://groups.google.com/d/topic/modwsgi/ehiiqqjQ6aA/discussion
>>
>> I didn't really have the time to look at it properly back then, but the 
>> code has always worked before with no issue and Apache wouldn't have change 
>> how memory pools are managed when reading configuration. I would be 
>> surprised if have managed to have a latent bug in there for so long.
>>
>> Any chance you can try Apache 2.4.4 and see if the issue goes away.
>>
>> Graham
>>
>>
>> On 28 March 2013 04:18, Don Tillman <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> With mod_wsgi 3.4, apache httpd 2.4.2, apr 1.4.6, apr_util 1.4.1.
>>>
>>> Embedded Mode seems to work well.   But I need Daemon Mode.
>>>
>>> When I add the WSGIProcessGroup directive to the config file and restart 
>>> Apache httpd, it blows out immediately with a segmentation fault with no 
>>> mention in the log.
>>>
>>> It happens in mod_wsgi.c, on line 10069, this:
>>>
>>>     entry->server = cmd->server;
>>>
>>>
>>> Debugging:
>>>
>>> --------
>>> gdb /usr/sbin/httpd 
>>> b wsgi_add_daemon_process
>>> run -X -f /etc/opt/tms/output/httpd.conf
>>> --------
>>>
>>>
>>> The first call to wsgi_add_daemon_process goes without incident.  It's 
>>> the second call that's a problem.
>>>
>>> --------
>>> Breakpoint 1, wsgi_add_daemon_process (cmd=0x7fffffffdb30, 
>>> mconfig=0x8cd0b0, args=0x8d08b8 "djangodaemon threads=15") at 
>>> mod_wsgi.c:9720
>>> 9720 mod_wsgi.c: No such file or directory.
>>>  in mod_wsgi.c
>>> --------
>>>
>>> Go to the line, examine the locals:
>>>
>>> --------
>>> (gdb) until 10069
>>> wsgi_add_daemon_process (cmd=0x7fffffffdb30, mconfig=0x8cd0b0, 
>>> args=0x8d08cf "") at mod_wsgi.c:10069
>>> 10069 in mod_wsgi.c
>>> (gdb) p *wsgi_daemon_list
>>> $1 = {pool = 0x80d138, elt_size = 280, nelts = 1, nalloc = 20, elts = 
>>> 0x944058 ""}
>>> (gdb) p *cmd->server
>>> $2 = {process = 0x80b218, next = 0x0, error_fname = 0x5890ce 
>>> "logs/error_log", error_log = 0x8809f8, log = {module_levels = 0x0, level = 
>>> 4}, module_config = 0x8c77c8, 
>>>   lookup_defaults = 0x8cbd78, defn_name = 0x0, defn_line_number = 0, 
>>> is_virtual = 0 '\000', port = 0, server_scheme = 0x0, server_admin = 
>>> 0x5890bb "[no address given]", 
>>>   server_hostname = 0x0, addrs = 0x880a70, timeout = 60000000, 
>>> keep_alive_timeout = 5000000, keep_alive_max = 100, keep_alive = 1, names = 
>>> 0x0, wild_names = 0x0, 
>>>   path = 0x0, pathlen = 0, limit_req_line = 8190, limit_req_fieldsize = 
>>> 8190, limit_req_fields = 100, context = 0x0}
>>> (gdb) p entry
>>> $3 = (WSGIProcessGroup *) 0x944058
>>> --------
>>>
>>> Okay.  Continue, for the next call.
>>>
>>> --------
>>> (gdb) c
>>> Continuing.
>>> Missing separate debuginfo for /lib64/libnss_files.so.2
>>> Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install 
>>> /var/lib/debug/.build-id/19/c24fb834453511d12dfb4283d3c70f1346e974.debug
>>>
>>> Breakpoint 1, wsgi_add_daemon_process (cmd=0x7fffffffdb30, 
>>> mconfig=0x852730, args=0x8d14a0 "djangodaemon threads=15") at 
>>> mod_wsgi.c:9720
>>> 9720 in mod_wsgi.c
>>> --------
>>>
>>> Go to the line, examine the locals.
>>>
>>> --------
>>> (gdb) until 10069
>>> wsgi_add_daemon_process (cmd=0x7fffffffdb30, mconfig=0x852730, 
>>> args=0x8d14b7 "") at mod_wsgi.c:10069
>>> 10069 in mod_wsgi.c
>>> (gdb) p *wsgi_daemon_list
>>> $4 = {pool = 0x5929f0, elt_size = 10, nelts = 1, nalloc = 4567695, elts 
>>> = 0x592b78 "mod_file_cache.c"}
>>> (gdb) p *cmd->server
>>> $5 = {process = 0x80b218, next = 0x0, error_fname = 0x5890ce 
>>> "logs/error_log", error_log = 0x845cb8, log = {module_levels = 0x0, level = 
>>> 4}, module_config = 0x845ee0, 
>>>   lookup_defaults = 0x8513f8, defn_name = 0x0, defn_line_number = 0, 
>>> is_virtual = 0 '\000', port = 0, server_scheme = 0x0, server_admin = 
>>> 0x5890bb "[no address given]", 
>>>   server_hostname = 0x0, addrs = 0x845d30, timeout = 60000000, 
>>> keep_alive_timeout = 5000000, keep_alive_max = 100, keep_alive = 1, names = 
>>> 0x0, wild_names = 0x0, 
>>>   path = 0x0, pathlen = 0, limit_req_line = 8190, limit_req_fieldsize = 
>>> 8190, limit_req_fields = 100, context = 0x0}
>>> (gdb) p entry
>>> $6 = (WSGIProcessGroup *) 0x592b78
>>> --------
>>>
>>> Continue, and blammo...
>>>
>>> --------
>>> (gdb) c
>>> Continuing.
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x00000000004fcdd6 in wsgi_add_daemon_process (cmd=0x7fffffffdb30, 
>>> mconfig=0x852730, args=0x8d14b7 "") at mod_wsgi.c:10069
>>> 10069 in mod_wsgi.c
>>> (gdb) 
>>> --------
>>>
>>> So it's not the usual kind of null pointer segv situation.  The 
>>> assignment on line 10069 looks to me like it should work.
>>>
>>> I did notice that wsgi_daemon_lists's nalloc value looks mighty large 
>>> the second time around; perhaps that's an issue.
>>>
>>> Any ideas?  This is very frustrating.
>>>
>>>   -- Don
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi?hl=en.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] <javascript:>
> .
> Visit this group at http://groups.google.com/group/modwsgi?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [modwsgi] Immediate Segmentation Fault in Daemon Mode

Reply via email to