[modwsgi] Re: Random segmentation fault errors

inaf Sun, 10 Jan 2010 21:01:57 -0800


On Jan 10, 8:01 pm, Graham Dumpleton <[email protected]>
wrote:
> 2010/1/8 inaf <[email protected]>:
>
> > Regarding the pids not matching, I found out that the seg fault is for
> > siteminder agent pid.. but there is an interesting coincidence where
> > mod_wsgi daemon throws these errors and shortly after the seg fault
> > comes for siteminder agent..  I also confirmed that it is not always
> > the case.. so mod_wsgi errors are not always followed by a seg fault
> > error..
>
> For this site minder, you have a mod_siteminder module loaded into
> Apache. Correct?


I believe so.. this is the line in the config:

LoadModule sm_module "/usr/netegrity/siteminder6qmr5/webagent/bin/
libmod_sm20.so"

>
> This still stinks of memory corruption. Specifically, some Apache
> module is keeping a pointer to memory contained in an Apache memory
> pool after the memory pool is released. That memory area is then being
> given out mod_wsgi as per request scratch space. The other Apache
> module is then scribbling in the memory.
>
> What doesn't make sense is why it is always replacing it with
> '<script_name>'. Are you doing that to protect what the actual paths
> are or is that really what the error log files say?
>
> If it is always that, then perhaps not memory corruption but some
> Apache module deliberately updating r->filename in Apache request
> structure and replacing it with a new string encompassing that string.
>
> Can you clarify whether you are modifying the logs or whether that is
> the actual value.

I am replacing the script name when I paste the lines to protect the
actual paths.

As I mentioned in one of the previous post, this script accesses a
singleton object (well not quite a singleton object but close) where
it reads from a class' variables (not sure if synchronization is
required but I even tried that with lock.. still no luck). The problem
is that this script is SSI'ed on a page 3 times with different
parameters so it does not entirely break the page but these 3 places
where it displays content, I think we are getting SSI error messages
however I have not experienced it myself (I took out the "unable to
include .." SSI errors from the log lines I pasted earlier not to
confuse you as I don't think it has anything to do with it). We are
putting a script in place to hit the page every so often and scrape
the content hoping that in one of those runs, we will get this error
and able to see what a user would see. The reason for doing this is
that I was able to reproduce the error with another WSGI script that
refreshes these class variables every once in a while via a job. I
continously hit this URL in my browser and got the error in the logs
but response was not always 500. It was able to get the result back
successfully in some cases. So I want to see if the users really see a
problem or somehow Apache tries to SSI the script again and is able to
do so.

The other approach I have in mind is to see what is going through the
unix socket between mod_wsgi daemon and apache. I am not quite sure if
it is possible but still investigating.

And another approach is to disable SiteMinder agent and try to test it
in a pre-production environment with a load generator to see if we get
these errors or not. This would help out rulling out (hopefully)
SiteMinder.


Nonetheless, it is quite a strange problem and having these errors in
the logs is not good since this means we are not defect free. I really
do not want to throw away what I have implemented as it takes
advantage of unique strengths of Python and is unbelievable fast and
not resource intensive at all thanks to mod_wsgi. I really appreciate
your work and have so much respect for what you are doing. Thanks once
again for making my idea a reality.. well almost there :)

Regards,
-Cem

>
> Graham
>
>
>
> > As far as the error logs are concerned, these are pretty much the only
> > ones along with siteminder lines. Please see below:
>
> > After upgrade:
>
> > (nothing before these lines for a while)
> > [Thu Jan 07 13:57:05 2010] [alert] mod_wsgi (pid=21148): Request
> > origin could not be validated.
> > [Thu Jan 07 13:57:05 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
> > [Thu Jan 07 13:57:06 2010] [notice] child pid 31237 exit signal
> > Segmentation fault (11)
> > (nothing after the lines above for a while)
>
> > ..........
>
> > [07/Jan/2010:14:46:28] [Information] SiteMinder Agent
> >        SiteMinder agent is enabled.
> > [07/Jan/2010:14:46:28] [Information] SiteMinder Agent
> >        Configuration file path:
> >        '/appl/apache1/conf/WebAgent.conf'.
> > [Thu Jan 07 14:46:28 2010] [alert] mod_wsgi (pid=21148): Request
> > origin could not be validated.
> > [Thu Jan 07 14:46:28 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676200496 using key 0x6b000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSharedSegment::smalloc] Attached to shared memory segment
> > 550469672 using key 0x6c000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 681279577 using key 0xc8000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 681279577 using key 0xc8000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676397111 using key 0x66000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676429880 using key 0x67000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSharedSegment::smalloc] Attached to shared memory segment
> > 550633517 using key 0x65000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676462649 using key 0x68000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676495418 using key 0x69000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676266034 using key 0x32000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSharedSegment::smalloc] Attached to shared memory segment
> > 550502441 using key 0x61000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676331573 using key 0x33000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSharedSegment::smalloc] Attached to shared memory segment
> > 550567979 using key 0x62000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676364342 using key 0x34000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSharedSegment::smalloc] Attached to shared memory segment
> > 550600748 using key 0x63000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSem::getSem] Attached to semaphore 676298804 using key 0x6a000dd5
> > [07/Jan/2010:14:46:29] [Info] [CA WebAgent IPC] [10627]
> > [CSmSharedSegment::smalloc] Attached to shared memory segment
> > 550535210 using key 0x69000dd5
> > [07/Jan/2010:14:46:29] [Information] SiteMinder Agent
> >        SiteMinder agent is running.
> > [Thu Jan 07 14:49:06 2010] [alert] mod_wsgi (pid=21148): Request
> > origin could not be validated.
> > [Thu Jan 07 14:49:06 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
>
> > ........
>
> > [Thu Jan 07 15:00:03 2010] [alert] mod_wsgi (pid=21148): Request
> > origin could not be validated.
> > [Thu Jan 07 15:00:03 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
> > [Thu Jan 07 15:00:08 2010] [alert] mod_wsgi (pid=21148): Request
> > origin could not be validated.
> > [Thu Jan 07 15:00:08 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
> > [Thu Jan 07 15:00:45 2010] [alert] mod_wsgi (pid=21148): Request
> > origin could not be validated.
> > [Thu Jan 07 15:00:45 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
> > [Thu Jan 07 15:00:50 2010] [alert] mod_wsgi (pid=21148): Request
> > origin could not be validated.
> > [Thu Jan 07 15:00:50 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
>
> > .....
>
> > [Thu Jan 07 16:33:42 2010] [alert] mod_wsgi (pid=14480): Request
> > origin could not be validated.
> > [Thu Jan 07 16:33:42 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
>
> > .........
>
> > [Thu Jan 07 17:15:03 2010] [alert] mod_wsgi (pid=27655): Request
> > origin could not be validated.
> > [Thu Jan 07 17:15:03 2010] [error] [client 3.49.42.185] Premature end
> > of script headers: <script_name>.wsgi
> > *** glibc detected *** malloc(): memory corruption: 0x08435ab8 ***
>
> > I know it looks like I am cherry picking these lines but there is
> > nothing above or below in the logs other than the first snippet where
> > siteminder specific lines.
>
> > Log level is debug now so will see what can be captured..
>
> >> >> > I have siteminder agent running as well and I notice that bunch of seg
> >> >> > fault errors associated to that follows along with malloc errors...
>
> >> >> > Here's my configuration:
>
> >> >> >  Apache/2.0.59 (Unix) mod_jk/1.2.18 mod_wsgi/2.6 Python/2.5.4
> >> >> > configured
>
> >> >> > WSGIApplicationGroup %{GLOBAL}
> >> >> > WSGIDaemonProcess wsgi processes=1 threads=1 display-name=%{GROUP}
>
> >> >> You don't need 'processes=1' as will default to single process and
> >> >> using 'processes=1' instead of allow it to default has subtle side
> >> >> affect of setting 'wsgi.multiprocess' to True. You should only use
> >> >> 'processes=1' if load balancing across many Apache instances where
> >> >> each has only single process in daemon process group for that
> >> >> application.
>
> >> > I have 4 apaches running on the box with the same wsgi configuration..
> >> > the box has 4 cores hence 4 apaches.. I have only 3 simple wsgi
> >> > scripts running.. one of them is used for testing, another one is
> >> > actively used in production and the third one is only hit by a back
> >> > end script to refresh data in a singleton object, which is used by
> >> > others for only read.. so I guess it is ok to keep processes=1?
>
> >> Just because you have four cores doesn't mean you need to run multiple
> >> Apache instances. Apache is already multiprocess and you can within
> >> one mod_wsgi daemon process group specify multiple processes as well.
> >> So, running multiple Apache instances on same box with same
> >> configuration is not necessary to make the most of those cores. Have a
> >> read of:
>
> >>  http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modws...
>
> >> But then, if running multiple Apache instances so you can restart each
> >> without interfering with the others, then that is a different issue.
> >> Even then, ensure you read:
>
> >>  http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode
>
> >> Because for mod_wsgi hosted Python applications at least, there are
> >> various ways you can trigger reloading of application without needing
> >> to restart whole of Apache.
>
> > The reason for multiple apaches is to be able to isolate issues.. it
> > is a long story :)
>
> > I followed your advice on processes and changed the configuration to
> > the following:
>
> > WSGIApplicationGroup %{GLOBAL}
> > WSGIDaemonProcess wsgi threads=1 display-name=%{GROUP}
> > WSGIProcessGroup wsgi
>
> > Did not seem to help (not sure if it was expected to help)..
>
> >> >> > WSGIProcessGroup wsgi
>
> >> >> > Any help and insight would be much appreciated..
>
> >> >> I can only suggest trying mod_wsgi 3.1.
>
> >> > Just did.. monitoring to see if I get any errors..
>
> >> >> Other than that don't really have an answer. It looks like memory
> >> >> corruption but whether the source is mod_wsgi, another Apache module
> >> >> or a Python C extension module, don't know.
>
> >> >> What third party Python modules do you use which may have a C
> >> >> extension module component?
>
> >> >> Anyway, will have a think about it some more and see if can come up
> >> >> with any suggestions of things to look for or try. A snippet of log
> >> >> file covering a longer amount of time may be a good point.
>
> >> >> Graham
>
> >> > Another question I had was whether slow network connections might
> >> > cause this issue.. what are your thoughts on that?
>
> >> Not a process crash as you are seeing.
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

[modwsgi] Re: Random segmentation fault errors

Reply via email to