Various installations of WebKit's application server (via mod_webkit) on separate machines go to lunch sometimes. I've never gone so far as to troubleshoot or faultfind to pinpoint the problem. I use a cronjob Python script to call a specific URL once every five minutes, and if it returns the Apache standard "Internal Server Error", kill and restart the AppServer.

It's a "production" environment, but internal company server, so five minutes downtime isn't that annoying.

On Wednesday, August 6, 2003, at 10:07 PM, Hancock, David (DHANCOCK) wrote:

Adam: Thanks for the additional information.  Stephan Diehl also has seen this situation on his systems.
 
I agree about the gap in the PIDs, but most of the time, they're contiguous.  We do sort of a "heartbeat"  ping on our servers with an HTTP request at least every 5 minutes, which is how we notice the problem. We've got two machines running Apache and WebKit, load-balanced, but each gets hit pretty often.  There's a LOT of memory on these machines (2GB physical); we've typically got 500MB physical free and swap generally shows 0K used.  We'll start capturing memory data to see if we really are using some swap space.
 
My understanding of swapping (which, granted, is apt to be faulty) is that Linux isn't apt to swap something to disk while there's unused physical memory.
 
We are using mod_webkit, and even with the WebKit processes wedged,  the port (we're using 8086) is still listening, just not responding.
 
If we were able to reproduce this situation on our development or test systems, we could use the debugger to find out more about what's going on, but in production, our first priority is to get the system responding again.
 
If/when I learn more, I'll follow up to the list.  And if anybody else has some ideas, I'd be grateful to hear them.

Cheers!
--
David Hancock | [EMAIL PROTECTED] | 410-266-4384



-----Original Message-----
From: Adam Kerrison [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 06, 2003 10:26 AM
To: [EMAIL PROTECTED]
Subject: RE: [Webware-discuss] RE: Anyone seen WebKit processes going into a weird state?

I can't say I've experienced this behaviour directly but few points:

 
- Process name in brackets does mean "swapped to disk" probably because the process has been inactive for a while (seems likely!)
 
- The gap in the PID could just be that another process started at that time - you can't rely on the PID's being contiguous
 
- I have had problems where I had to kill threaded apps when the code raises an exception. In SOME casesthe thread dies and the application stops responding (depends a lot on how the app is designed). I don't think I've seen this specifically with Webware but if the socket handler dies then the other threads will be waiting for things that will never happen (and the process will be swapped out eventually). I am assuming a lot about how the AppServer is working - I don't know that this is right but I'm sure someone will correct me :-)
 
If you're using mod_webkit  - and assuming that it maintains a connection from apache to webkit - you should be able to see this connection via netstat. If the socket handler has died then the socket may have gone. Using gdb you should be able to see the threads running and the state but that probably less useful in python.
 
Not sure that this helps or not - might be a red herring
 
Adam


-----Original Message-----
From: Hancock, David (DHANCOCK) [mailto:[EMAIL PROTECTED]
Sent: 06 August 2003 13:28
To: '[EMAIL PROTECTED]'
Subject: [Webware-discuss] RE: Anyone seen WebKit processes going into a weird state?

Sorry to be replying to my own post, but I haven't seen any list traffic related to my question below, so maybe it didn't get out to the list.  The situation described below has occurred several times this week, and in most cases there is a gap in the process numbering.  Every other time I've looked, the "python Launch.py ThreadedAppServer" process numbers are sequential, with no gaps.  They must start up very quickly.  In the list below, there is a gap (25802 is missing).


I'm grasping at straws here.  I think that the process id in brackets with no command line means that the process is swapped to disk, but I'm not sure about that.  When we see the processes looking like they do below, they really ARE wedged, though, and require manual termination.

Cheers!
--
David Hancock | [EMAIL PROTECTED] | 410-266-4384


 -----Original Message-----
From:  
Hancock, David (DHANCOCK) 
Sent:  
Friday, August 01, 2003 4:57 PM
To:    
[EMAIL PROTECTED]
Subject:       
Anyone seen WebKit processes going into a weird state?

Several times a week on our production systems, we're seeing our WebKit processes (normally entitled "python Launch.py ThreadedAppServer") lose their command lines in the output from ps.  They're also well wedged, and the processes need to be killed by hand to clear this situation.  Has anybody else seen this and have some ideas to help us troubleshoot?  For now, we're detecting the situation with automated monitoring (and process-killing and webkit-restarting), but we'd sure like to know how we can prevent it, not just work around it.

Output from ps auxww:

adc      25799  0.1  1.6 130288 34252 ?      SN   Jul28  10:04 [python]
adc      25800  0.0  1.6 130288 34252 ?      SN   Jul28   0:00 [python]
adc      25801  0.0  1.6 130288 34252 ?      SN   Jul28   2:52 [python]
adc      25803  0.0  1.6 130288 34252 ?      SN   Jul28   1:37 [python]
adc      25804  0.0  1.6 130288 34252 ?      SN   Jul28   2:17 [python]
adc      25805  0.0  1.6 130288 34252 ?      SN   Jul28   1:37 [python]
adc      25806  0.0  1.6 130288 34252 ?      SN   Jul28   1:45 [python]
adc      25807  0.0  1.6 130288 34252 ?      SN   Jul28   1:27 [python]
adc      25808  0.0  1.6 130288 34252 ?      SN   Jul28   1:51 [python]
adc      25809  0.0  1.6 130288 34252 ?      SN   Jul28   1:08 [python]
adc      25810  0.0  1.6 130288 34252 ?      SN   Jul28   3:37 [python]


Our setup includes:

Python 2.2
Webware 0.8
RedHat Linux 7.3
A  couple C extensions: DCOracle2 and pymqi (interface to IBM's MQSeries)


Thanks in advance for any ideas and assistance.

P.S. We had an extreme example of something similar several months ago, but even the "[python]" was missing from the ps output. Thus, it didn't look like WebKit was running at all, but a start attempt couldn't bind to the port. We could only find the culprit process with "netstat -anp | grep 8086" run as root.  I don't know if that failure is related, though, it was just weird.

Cheers!
--
David Hancock | [EMAIL PROTECTED] | 410-266-4384



=====================================
Missing Monkey Head
enthralling mysteries of simian cranial abduction
http://neurobashing.com/monkey/

Reply via email to