On Thursday 07 August 2003 15:10, Thomas E Jenkins wrote:
> For what it's worth I've had the same problems.  It always follows my
> other problem, MySQL throwing (2013, 'Lost connection to MySQL server
> during query').  

Once in a while, I've seeing the same errormessage. Interestingly, this was 
always early morning and the problem went away after a while. It just looked 
like the connection run into a timeout over night.
Maybe it's possible, to check for this specific errormessage in the DBPool.py 
code and reset all connections if this happens.

Other than that, I've seen several large processes (about 200MB) that appeared 
and didn't go away. In that case, no convinient errormessage could be seen, 
but I suspect that it has something to do with swish-e, an external indexer 
that runs in a os.system call once in a while.
But then, again, this is not consistent behaviour and happens only very seldom 
and has probably nothing to do with Webware at all but with a multithreaded 
environment.

> I get the sames symptoms with the webkit still
> listening but not responding.   I have been unable to reproduce this
> problem reliably, it does happen about once a day.
>
>
>                              Date:
> Thu, 07 Aug 2003 07:55:16 -0400
>
> On Wed, 2003-08-06 at 22:07, Hancock, David (DHANCOCK) wrote:
> > Adam: Thanks for the additional information.  Stephan Diehl also has
> > seen this situation on his systems.
> >
> > I agree about the gap in the PIDs, but most of the time, they're
> > contiguous.  We do sort of a "heartbeat"  ping on our servers with an
> > HTTP request at least every 5 minutes, which is how we notice the
> > problem. We've got two machines running Apache and WebKit,
> > load-balanced, but each gets hit pretty often.  There's a LOT of
> > memory on these machines (2GB physical); we've typically got 500MB
> > physical free and swap generally shows 0K used.  We'll start capturing
> > memory data to see if we really are using some swap space.
> >
> > My understanding of swapping (which, granted, is apt to be faulty) is
> > that Linux isn't apt to swap something to disk while there's unused
> > physical memory.
> >
> > We are using mod_webkit, and even with the WebKit processes wedged,
> > the port (we're using 8086) is still listening, just not responding.
> >
> > If we were able to reproduce this situation on our development or test
> > systems, we could use the debugger to find out more about what's going
> > on, but in production, our first priority is to get the system
> > responding again.
> >
> > If/when I learn more, I'll follow up to the list.  And if anybody else
> > has some ideas, I'd be grateful to hear them.
> >
> > Cheers!
> > --
> > David Hancock | [EMAIL PROTECTED] | 410-266-4384
> >
> >         -----Original Message-----
> >         From: Adam Kerrison [mailto:[EMAIL PROTECTED]
> >         Sent: Wednesday, August 06, 2003 10:26 AM
> >         To: [EMAIL PROTECTED]
> >         Subject: RE: [Webware-discuss] RE: Anyone seen WebKit
> >         processes going into a weird state?
> >
> >
> >         I can't say I've experienced this behaviour directly but few
> >         points:
> >
> >         - Process name in brackets does mean "swapped to disk"
> >         probably because the process has been inactive for a while
> >         (seems likely!)
> >
> >         - The gap in the PID could just be that another process
> >         started at that time - you can't rely on the PID's being
> >         contiguous
> >
> >         - I have had problems where I had to kill threaded apps
> >         when the code raises an exception. In SOME cases the thread
> >         dies and the application stops responding (depends a lot on
> >         how the app is designed). I don't think I've seen this
> >         specifically with Webware but if the socket handler dies then
> >         the other threads will be waiting for things that will never
> >         happen (and the process will be swapped out eventually). I am
> >         assuming a lot about how the AppServer is working - I don't
> >         know that this is right but I'm sure someone will correct
> >         me :-)
> >
> >         If you're using mod_webkit  - and assuming that it maintains a
> >         connection from apache to webkit - you should be able to see
> >         this connection via netstat. If the socket handler has died
> >         then the socket may have gone. Using gdb you should be able to
> >         see the threads running and the state but that probably less
> >         useful in python.
> >
> >         Not sure that this helps or not - might be a red herring
> >
> >         Adam
> >                 -----Original Message-----
> >                 From: Hancock, David (DHANCOCK)
> >                 [mailto:[EMAIL PROTECTED]
> >                 Sent: 06 August 2003 13:28
> >                 To: '[EMAIL PROTECTED]'
> >                 Subject: [Webware-discuss] RE: Anyone seen WebKit
> >                 processes going into a weird state?
> >
> >
> >
> >                 Sorry to be replying to my own post, but I haven't
> >                 seen any list traffic related to my question below, so
> >                 maybe it didn't get out to the list.  The situation
> >                 described below has occurred several times this week,
> >                 and in most cases there is a gap in the process
> >                 numbering.  Every other time I've looked, the "python
> >                 Launch.py ThreadedAppServer" process numbers are
> >                 sequential, with no gaps.  They must start up very
> >                 quickly.  In the list below, there is a gap (25802 is
> >                 missing).
> >
> >                 I'm grasping at straws here.  I think that the process
> >                 id in brackets with no command line means that the
> >                 process is swapped to disk, but I'm not sure about
> >                 that.  When we see the processes looking like they do
> >                 below, they really ARE wedged, though, and require
> >                 manual termination.
> >
> >                 Cheers!
> >                 --
> >                 David Hancock | [EMAIL PROTECTED] | 410-266-4384
> >
> >                          -----Original Message-----
> >                         From:   Hancock, David (DHANCOCK)
> >                         Sent:   Friday, August 01, 2003 4:57 PM
> >                         To:     [EMAIL PROTECTED]
> >                         Subject:        Anyone seen WebKit processes
> >                         going into a weird state?
> >
> >                         Several times a week on our production
> >                         systems, we're seeing our WebKit processes
> >                         (normally entitled "python Launch.py
> >                         ThreadedAppServer") lose their command lines
> >                         in the output from ps.  They're also well
> >                         wedged, and the processes need to be killed by
> >                         hand to clear this situation.  Has anybody
> >                         else seen this and have some ideas to help us
> >                         troubleshoot?  For now, we're detecting the
> >                         situation with automated monitoring (and
> >                         process-killing and webkit-restarting), but
> >                         we'd sure like to know how we can prevent it,
> >                         not just work around it.
> >
> >                         Output from ps auxww:
> >
> >                         adc      25799  0.1  1.6 130288 34252 ?
> >                         SN   Jul28  10:04 [python]
> >                         adc      25800  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   0:00 [python]
> >                         adc      25801  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   2:52 [python]
> >                         adc      25803  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   1:37 [python]
> >                         adc      25804  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   2:17 [python]
> >                         adc      25805  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   1:37 [python]
> >                         adc      25806  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   1:45 [python]
> >                         adc      25807  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   1:27 [python]
> >                         adc      25808  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   1:51 [python]
> >                         adc      25809  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   1:08 [python]
> >                         adc      25810  0.0  1.6 130288 34252 ?
> >                         SN   Jul28   3:37 [python]
> >
> >                         Our setup includes:
> >
> >                                 Python 2.2
> >                                 Webware 0.8
> >                                 RedHat Linux 7.3
> >                                 A  couple C extensions: DCOracle2 and
> >                                 pymqi (interface to IBM's MQSeries)
> >
> >
> >                         Thanks in advance for any ideas and
> >                         assistance.
> >
> >                         P.S. We had an extreme example of something
> >                         similar several months ago, but even the
> >                         "[python]" was missing from the ps output.
> >                         Thus, it didn't look like WebKit was running
> >                         at all, but a start attempt couldn't bind to
> >                         the port. We could only find the culprit
> >                         process with "netstat -anp | grep 8086" run as
> >                         root.  I don't know if that failure is
> >                         related, though, it was just weird.
> >
> >                         Cheers!
> >                         --
> >                         David Hancock | [EMAIL PROTECTED] |
> >                         410-266-4384



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Webware-discuss mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/webware-discuss

Reply via email to