[PATCH] Broken sockopt_set in mod_proxy (SO_RCVBUF)

2003-04-03 Thread Adam Sussman

I started seeing a lot of these in the 2.0.45 build:

(22)Invalid argument: apr_socket_opt_set(SO_RCVBUF): Failed to set 
ProxyReceiveBufferSize, using default

I dug into this and discovered to my suprise that the APR never implemented SO_RCVBUF
for the unix platform.  Previous versions of the apr just ignored options it didn't 
understand
but the most recent version now throws exceptions when it sees an unknown option.

This patch implements SO_RCVBUF support for apr_socket_opt_set on the unix platform.  
It also
alters the response code on an unknown option to be APR_ENOTIMPL, which IMHO is a lot 
clearer
than invalid argument and which would have saved me a hour or so of scratching my 
head.

-adam


Index: sockopt.c
===
RCS file: /home/cvspublic/apr/network_io/unix/sockopt.c,v
retrieving revision 1.67
diff -u -r1.67 sockopt.c
--- sockopt.c   24 Feb 2003 23:13:29 -  1.67
+++ sockopt.c   4 Apr 2003 03:06:41 -
@@ -198,6 +198,18 @@
 return APR_ENOTIMPL;
 #endif
 break;
+case APR_SO_RCVBUF:
+#ifdef SO_RCVBUF
+if (apr_is_option_set(sock-netmask, APR_SO_RCVBUF) != on) {
+if (setsockopt(sock-socketdes, SOL_SOCKET, SO_RCVBUF, (void *)on, 
sizeof(int)) == -1) {
+return errno;
+}
+apr_set_option(sock-netmask, APR_SO_RCVBUF, on);
+}
+#else
+return APR_ENOTIMPL;
+#endif
+break;
 case APR_SO_NONBLOCK:
 if (apr_is_option_set(sock-netmask, APR_SO_NONBLOCK) != on) {
 if (on) {
@@ -326,7 +338,7 @@
 #endif
 break;
 default:
-return APR_EINVAL;
+return APR_ENOTIMPL;
 }
 
 return APR_SUCCESS; 


Re: Seg Fault on first SSL hit after startup

2003-02-19 Thread Adam Sussman
On Tue, Feb 18, 2003 at 04:58:27PM -0800, MATHIHALLI,MADHUSUDAN (HP-Cupertino,ex1) 
wrote:
 I have been having problems with shmht (haven't looked into it for ages). Do
 you get the same problem even with shmcb ?.
 

shmcb seems to be fine.

-adam



Seg Fault on first SSL hit after startup

2003-02-18 Thread Adam Sussman

This seg fault occurs sporadically on the FIRST SSL hit to 
the server immediatly after startup.  Subsequent hits to SSL
do not have a problem.

This occurs only with this in the conf:

SSLSessionCache shmht:/usr/local/apache/ssl/ssl_cache 


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2051 (LWP 28822)]
0x40260c7c in memcpy () from /lib/i686/libc.so.6
(gdb) where
#0  0x40260c7c in memcpy () from /lib/i686/libc.so.6
#1  0x0824c7e0 in ?? () at eval.c:41
#2  0x080a25a4 in ssl_scache_store (s=0x820d808, id=0x827e298 
¹\202pÙ[\222\beÆ;-½\001ýàfS²»\231£Ô¯\031\233üÒ\004¯\017\031. , idlen=32, 
expiry=1045611512, sess=0x827e250) at ssl_scache.c:129
#3  0x0809bd32 in ssl_callback_NewSessionCacheEntry (ssl=0x824c7e0, session=0x827e250) 
at ssl_engine_kernel.c:1732
#4  0x080eb86e in ssl_update_cache () at eval.c:41
Cannot access memory at address 0x2
(gdb) up 2
#2  0x080a25a4 in ssl_scache_store (s=0x820d808, id=0x827e298 
¹\202pÙ[\222\beÆ;-½\001ýàfS²»\231£Ô¯\031\233üÒ\004¯\017\031. , idlen=32, 
expiry=1045611512, sess=0x827e250) at ssl_scache.c:129
129 rv = ssl_scache_shmht_store(s, id, idlen, expiry, sess);


Digging into this a bit, the memcpy that is faulting is occuring in 
ssl_scache_shmht.c in ssl_cache_shmht_store on the line which
says:

memcpy(vp, expiry, sizeof(time_t));

I'm not sure why this is happening.  My debugging indicates that vp is not NULL.

I am not very familiar with mod_ssl, so any pointers or suggestions would be welcome.
This is happening on 2.0.43 with a statically linked openssl version 0.9.6g on
linux 2.4.18 with libc 2.2.4.

thanks,

-adam

-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman
Vidya Media Ventures

[EMAIL PROTECTED]




[PATCH] mod_rewrite and cookie setting

2002-07-16 Thread Adam Sussman


The new cookie setting feature of mod_rewrite adds the Set-Cookie header
to r-headers_out.  Shouldn't this be r-err_headers_out instead?

The error headers are always present whereas the the normal headers do not
appear under error conditions.  In applications where I have an apache
module setting cookies, I have always found that setting err_headers_out
gives me the complete coverage that I want.

Thoughts?

-adam

Attached is a patch to set err_headers_out instead:

Index: mod_rewrite.c
===
RCS file: /home/cvspublic/httpd-2.0/modules/mappers/mod_rewrite.c,v
retrieving revision 1.124
diff -u -r1.124 mod_rewrite.c
--- mod_rewrite.c   10 Jul 2002 06:01:10 -  1.124
+++ mod_rewrite.c   16 Jul 2002 17:15:33 -
 -4162,12 +4162,7 
: NULL, 
   NULL);
 
-
-/* 
- * XXX: should we add it to err_headers_out as well ?
- * if we do we need to be careful that only ONE gets sent out
- */
-apr_table_add(r-headers_out, Set-Cookie, cookie);
+apr_table_add(r-err_headers_out, Set-Cookie, cookie);
 rewritelog(r, 5, setting cookie '%s' to '%s', var, val);
 }
 }




Re: [PATCH] mod_rewrite and cookie setting

2002-07-16 Thread Adam Sussman

On Tue, Jul 16, 2002 at 10:26:49AM -0700, Ian Holsman wrote:
 Adam Sussman wrote:
  The new cookie setting feature of mod_rewrite adds the Set-Cookie header
  to r-headers_out.  Shouldn't this be r-err_headers_out instead?
  
  The error headers are always present whereas the the normal headers do not
  appear under error conditions.  In applications where I have an apache
  module setting cookies, I have always found that setting err_headers_out
  gives me the complete coverage that I want.
  
  Thoughts?
 yep.. a couple of them
 the original patch has err_headers_out and it didn't work as we would 
 get multiple cookies back on a simple request on GET / on a standard 
 install.
 

hmm... I cannot reproduce this behaviour.  So far as I can see, the only
difference is whether or not the cookie header appears in non-200 reponses.
Can you show me the configuration you used?

-adam



Difficulties with SSL and mod_proxy

2002-04-06 Thread Adam Sussman


I am having trouble with certain combinations of SSL and mod_proxy.
If I have apache 2.0 acting as an SSL enabled server, I can get it
to proxy to a remote SSL server but NOT to a remote clear text server.

So, while this configuration works:

  ProxyPass /foo/ https://otherhost/bar
  SSLProxyEngin On
  .. etc ...

This does not:

  ProxyPass /foo/ http://otherhost/bar

I seem to recall this working several releases ago.

I'm not sure where to look on this, but here are some more details:

1) Apache clearly makes the TCP connection to the downstream server
   but doesn't actualy send any data.

2) After the connection to the downstream server is made, it seems to just
   freeze waiting for input.  Here is a stacktrace taken when it has reached
   this state (this is not a segfault or anything):

0x402b00ee in __select () from /lib/i686/libc.so.6
(gdb) bt
#0  0x402b00ee in __select () from /lib/i686/libc.so.6
#1  0x40043e14 in __DTOR_END__ () at eval.c:41
#2  0x40038325 in apr_recv () at eval.c:41
#3  0x4001d256 in socket_read () at eval.c:41
#4  0x080c6f76 in core_input_filter (f=0x827ba00, b=0x827b9c0, mode=AP_MODE_READBYTES, 
block=APR_BLOCK_READ, readbytes=11) at core.c:3430
#5  0x080be000 in ap_get_brigade (next=0x827ba00, bb=0x827b9c0, 
mode=AP_MODE_READBYTES, block=APR_BLOCK_READ, readbytes=11) at util_filter.c:508
#6  0x0808adca in bio_bucket_in_read (bio=0x8227388, in=0x8281ec8 , inl=11) at 
ssl_engine_io.c:395
#7  0x080fa95d in BIO_read () at eval.c:41
Cannot access memory at address 0xb
(gdb) up 4
#4  0x080c6f76 in core_input_filter (f=0x827ba00, b=0x827b9c0, mode=AP_MODE_READBYTES, 
block=APR_BLOCK_READ, readbytes=11) at core.c:3430
3430rv = apr_bucket_read(e, str, len, block);
(gdb) print *f
$1 = {frec = 0x8195b90, ctx = 0x827b9f0, next = 0x0, r = 0x0, c = 0x823e280}


There's a little bit of wierdness here.  This is a single process httpd, not threaded.

The connection the filter is attached to is that of mod_proxy to the downstream server
which is not SUPPOSED to be SSL.  The filter stack on this connection is ssl/tls 
filter,
then core_in.  Since the downstream is not an SSL server, this seems incorrect.
Also, at this stage, no data has been sent to the downstream server, so waiting on 
input
from the connection doesn't make sense.

Any pointers here would be helpfull as I am not sure where to look.

-adam



Re: [PATCH] make mod_proxy not accept HTTP/0.9

2002-04-02 Thread Adam Sussman

 This looks really arcane to me though - there are very few HTTP/0.9
 servers out there that I am aware of to start with, adding functionality
 to specifically not support them seems like software bloat to me.
 
 Is this a real problem in your installation?

This is really a problem of reliability.  Apache will take anything that
doesn't look like a valid status line and assume that it is HTTP/0.9.
There's no control.  Corrupted packets alone could cause this assumption
to be made when it is not appropriate.

This is not an issue of my system running accross the odd old fashioned
server.  It is an issue of it willfully mis-interpreting a broken system.
We should at least give ourselves a measure of control in this situation.

In our particular application here at Ticketmaster, we will be relying
on a correct assesment of the reponse from a downstream server.  Mistaking
a broken system for a VALID 0.9 response is not acceptable.

-adam




[PATCH] make mod_proxy not accept HTTP/0.9

2002-03-29 Thread Adam Sussman


This patch adds a configuration directive ProxyRequireValidHTTPStatus.
When enabled, mod_proxy will require a valid HTTP status line from the
destination server and throw a 502 Bad Gateway error if it does not
get it.  Basicaly, this disallows backasswards reponses.

Why would one want to do this?  Well, I have a setup where my handler
is first attempting one proxy destination, and if that does not work,
it tries another.  It works by discarding the output of any response
that isn't a 200 and then trying another gateway.

If the gateway doesn't return a valid HTTP 1.0 or better status line,
mod_proxy assumes a 200 OK response.  In my environment, I control all
the gateway servers so I know that a working gateway will always give
me a real HTTP status line.  If it doesn't, I want to consider it a
bad gateway and try another.

-adam


Index: mod_proxy.c
===
RCS file: /home/cvspublic/httpd-2.0/modules/proxy/mod_proxy.c,v
retrieving revision 1.76
diff -u -r1.76 mod_proxy.c
--- mod_proxy.c 21 Mar 2002 12:05:45 -  1.76
+++ mod_proxy.c 30 Mar 2002 01:40:52 -
 -502,6 +502,7 
 ps-preserve_host =0;
 ps-timeout=0;
 ps-timeout_set=0;
+ps-require_valid_http_status=0;
 return ps;
 }
 
 -833,6 +834,16 
 }
 
 static const char *
+set_require_valid_http_status(cmd_parms *parms, void *dummy, int flag)
+{
+proxy_server_conf *psf =
+ap_get_module_config(parms-server-module_config, proxy_module);
+
+psf-require_valid_http_status = flag;
+return NULL;
+}
+
+static const char *
 set_recv_buffer_size(cmd_parms *parms, void *dummy, const char *arg)
 {
 proxy_server_conf *psf =
 -1041,6 +1052,8 
 AP_INIT_TAKE1(ProxyTimeout, set_proxy_timeout, NULL, RSRC_CONF,
  Set the timeout (in seconds) for a proxied connection. 
  This overrides the server timeout),
+AP_INIT_FLAG(ProxyRequireValidHTTPStatus, set_require_valid_http_status, NULL, 
+RSRC_CONF,
+ on if proxy should not accept reponses that don't give a valid HTTP 1.0 (or 
+better) status line),
  
 {NULL}
 };
Index: mod_proxy.h
===
RCS file: /home/cvspublic/httpd-2.0/modules/proxy/mod_proxy.h,v
retrieving revision 1.76
diff -u -r1.76 mod_proxy.h
--- mod_proxy.h 13 Mar 2002 20:47:53 -  1.76
+++ mod_proxy.h 30 Mar 2002 01:40:52 -
 -196,6 +196,8 
 int timeout;
 int timeout_set;
 
+int require_valid_http_status;
+
 } proxy_server_conf;
 
 typedef struct {
Index: proxy_http.c
===
RCS file: /home/cvspublic/httpd-2.0/modules/proxy/proxy_http.c,v
retrieving revision 1.138
diff -u -r1.138 proxy_http.c
--- proxy_http.c21 Mar 2002 12:05:45 -  1.138
+++ proxy_http.c30 Mar 2002 01:40:52 -
 -730,6 +730,12 
 p_conn-close += 1;
 origin-keepalive = 0;
 }
+} else if (conf-require_valid_http_status) {
+apr_socket_close(p_conn-sock);
+backend-connection = NULL;
+return ap_proxyerror(r, HTTP_BAD_GATEWAY,
+apr_pstrcat(p, Corrupt status line returned by remote 
+server: , buffer, NULL));
 } else {
 /* an http/0.9 response */
 backasswards = 1;



[PATCH/PROPOSAL] add server_limit and thread_limit to scoreboard

2002-02-20 Thread Adam Sussman


I know this idea isn't totaly popular, but I thought I would throw this
out and see what people think.  Aaron's most recent patch to the scoreboard
creation logic allows you to make the apache scoreboard shared memory image
accessible to external programs.  This is very usefull and we have employed
this same methodology to monitor apache 1.3 in the past.

The problem with the new scoreboard though is that its size depends on configure
time variables rather than compile time variables.  There's no easy way for an
external program to be adaptable without parsing the conf files.

This patch adds the configured server_limit and thread_limit elements to the
global score and sets them at the time the scoreboard is intialized.

If there is a better way to derive this information, someone please tell me!

-adam

Index: include/scoreboard.h
===
RCS file: /home/cvspublic/httpd-2.0/include/scoreboard.h,v
retrieving revision 1.41
diff -u -r1.41 scoreboard.h
--- include/scoreboard.h19 Feb 2002 21:09:27 -  1.41
+++ include/scoreboard.h20 Feb 2002 21:41:53 -
@@ -159,6 +159,8 @@
 };
 
 typedef struct {
+int server_limit;
+int thread_limit;
 ap_scoreboard_e sb_type;
 ap_generation_t running_generation;/* the generation of children which
  * should still be serving requests. */
Index: server/scoreboard.c
===
RCS file: /home/cvspublic/httpd-2.0/server/scoreboard.c,v
retrieving revision 1.57
diff -u -r1.57 scoreboard.c
--- server/scoreboard.c 19 Feb 2002 21:09:27 -  1.57
+++ server/scoreboard.c 20 Feb 2002 21:41:53 -
@@ -158,6 +158,8 @@
 more_storage += thread_limit * sizeof(worker_score);
 }
 ap_assert(more_storage == (char*)shared_score + scoreboard_size);
+ap_scoreboard_image-global-server_limit = server_limit;
+ap_scoreboard_image-global-thread_limit = thread_limit;
 }
 
 /**




Re: has anybody seen worker segfaults?

2002-02-19 Thread Adam Sussman

On Tue, Feb 19, 2002 at 12:33:58PM -0500, Jeff Trawick wrote:
 Aaron Bannert [EMAIL PROTECTED] writes:
 
   Maybe this is a hint...  For a couple of the restart iterations,
   worker on AIX logs this:
   
   [crit] ap_queue_push failed with error code -1
  
  This will only happen in ap_queue_push when apr_thread_mutex_lock or
  ap_thread_mutex_unlock fail (Yes, I do error checking on the
  pthread lock/unlock cases *grin*).
  
  I'm guessing this is a problem with pthread mutexes on whatever version
  of linux runs on RH6.2?
 
 That log message was seen on AIX, not RH.
 

This would not be the first time pthread problems have been seen under
linux though.  On RH 7.1 we get segfaults in the middle of the fork
call with the prefork mpm under high load.  We still haven't been able
to figure out why this is happening, but it appears to be a problem
with linux pthreads.

Has anyone else been having problems with pthreads under linux?

-adam

-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman
Vidya Media Ventures

[EMAIL PROTECTED]




Re: [PATCH] new scoreboard creation logic, remove DEFAULT_SCOREBOARD from MPMs

2002-02-15 Thread Adam Sussman

 I had to modify the MPMs so they wouldn't try to set ap_scoreboard_fname
 any more. This #define is now fully owned by the scoreboard.c file.
 (Might we want to namespace-protect that #define? I don't know.)
 
 I'm posting this here for feedback because it is a big change and could
 use some testing on other platform/MPM combos, but I'd also like to
 wait until the current release process finishes.
 

This works well with perfork and worker under Linux.  I have a couple
of comments though:

1) There are some not infrequent cases I have run into where apache
   needs to be killed (for unrelated reasons) and the shared memory
   segment does not get cleaned up.  When this happens, you can't restart
   the server.  You get a file exists error and apache refuses to start
   up.  This is easy to fix with ipcrm but the error is confusing and 
   does not make the solution obvious.

   It would be nice if apache would clean up the shared memory segement
   if it sees it, or have a more meaningfull error.

2) Unless I am missing something, there does not seem to be an easy way
   for an external application accessing the scoreboard to know how to
   navigate the data structure.  You have to know the server limit and
   thread limit or else you run into problems.  It would be nice to
   be able to derive those values from the scoreboard image instead of
   the httpd.conf file.

-adam



[PATCH] mod_ssl segfault on child init

2002-02-15 Thread Adam Sussman


If the file specified by SSLMutex cannot be created (because the directory
does not exist for example), children will segfault on init without giving
any reason that the user can figure out.  This happens because the module
init in the parent never checks to see if the mutex intialization succeded.
This patch adds this check and a user-friendly error message.

-adam


Index: ssl_engine_init.c
===
RCS file: /home/cvspublic/httpd-2.0/modules/ssl/ssl_engine_init.c,v
retrieving revision 1.24
diff -u -r1.24 ssl_engine_init.c
--- ssl_engine_init.c   11 Jan 2002 06:05:18 -  1.24
+++ ssl_engine_init.c   16 Feb 2002 00:16:30 -
@@ -214,7 +214,7 @@
 /*
  *  initialize the mutex handling and session caching
  */
-ssl_mutex_init(s, p);
+if (!ssl_mutex_init(s, p)) return HTTP_INTERNAL_SERVER_ERROR;
 ssl_scache_init(s, p);
 
 /*
Index: ssl_engine_mutex.c
===
RCS file: /home/cvspublic/httpd-2.0/modules/ssl/ssl_engine_mutex.c,v
retrieving revision 1.9
diff -u -r1.9 ssl_engine_mutex.c
--- ssl_engine_mutex.c  11 Jan 2002 06:05:18 -  1.9
+++ ssl_engine_mutex.c  16 Feb 2002 00:16:30 -
@@ -70,8 +70,12 @@
 return TRUE;
 
 if (apr_lock_create(mc-pMutex, APR_MUTEX, APR_LOCKALL, APR_LOCK_DEFAULT,
-mc-szMutexFile, p) != APR_SUCCESS)
+mc-szMutexFile, p) != APR_SUCCESS) {
+ssl_log(s, SSL_LOG_CRIT|SSL_ADD_ERRNO,
+   Cannot create SSLMutex file `%s',
+mc-szMutexFile);
 return FALSE;
+}
 return TRUE;
 }




Re: prefork segfaults under load

2002-02-11 Thread Adam Sussman

 I agree that disabling threads is covering up a problem, but I suspect
 that the problem is in glibc and not in Apache.
 
 Some rather lame debug suggestions: 
 
 1) make sure you have the latest glibc... maybe the problem got fixed

Upgrading to the latest glibc does not seem to help.

 2) make sure you aren't running out of memory

We're not.

 3) grab the sources for the level of glibc you have and try to get
some idea of why __pthread_reset_main_thread() might segfault

There seem to be a number of ways that this could dump core and so far
we aren't having any luck tracking this down.  The best we can come
up with is that there is some stack corruption happening somewhere.

The latest CVS snapshot seems even more unstable, by the way.

Any other ideas we can chase after?

-adam

-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman
Vidya Media Ventures

[EMAIL PROTECTED]




Re: prefork segfaults under load

2002-02-11 Thread Adam Sussman

 
 Are you using APR HEAD?  We fixed a bug in pools, which was basically
 writing too much in too little space.

Yes. We are using HEAD on APR, APR-UTIL and httpd-2.0.

-adam

-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman
Vidya Media Ventures

[EMAIL PROTECTED]




Re: prefork segfaults under load

2002-02-06 Thread Adam Sussman

On Tue, Feb 05, 2002 at 10:45:07PM -0500, Jeff Trawick wrote:
 Adam Sussman [EMAIL PROTECTED] writes:
 
  I'm seeing a lot of error messages like this in my error log under load with
  lots of children (1300 or so):
  
 ...
  #0  pthread_sighandler (signo=11, ctx=
{gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 
0, edi = 1074716232, esi = 0, ebp = 3221222696, esp = 3221222656, ebx = 1074726616, 
edx = 1, ecx = 0, eax = 0, trapno = 13, err = 0, eip = 1074677988, cs = 35, __csh = 
0, eflags = 66070, esp_at_signal = 3221222656, ss = 43, __ssh = 0, fpstate = 0x0, 
oldmask = 2147483648, cr2 = 0})
  at signals.c:87
  #1  signal handler called
  #2  __pthread_reset_main_thread () at internals.h:372
  #3  0x400e40e5 in __fork () at ptfork.c:92
  #4  0x080762e8 in make_child (s=0x80b9378, slot=1343) at prefork.c:770
  #5  0x0807671b in perform_idle_server_maintenance (p=0x80b7578) at prefork.c:963
  #6  0x08076a7e in ap_mpm_run (_pconf=0x80b7578, plog=0x80e9640, s=0x80b9378) at 
prefork.c:1120
  #7  0x0807d5d2 in main (argc=1, argv=0xbb54) at main.c:501
  #8  0x4010d177 in __libc_start_main (main=0x807cca4 main, argc=1, 
ubp_av=0xbb54, init=0x805e22c _init, 
  fini=0x809e010 _fini, rtld_fini=0x4000e184 _dl_fini, stack_end=0xbb4c) 
at ../sysdeps/generic/libc-start.c:129
  (gdb)   
 
 I would try adding --disable-threads to the configure invocation to
 see if that helps.  It looks like the pthreads library is puking.
 
   make distclean
   ./configure --disable-threads old-parameters
   make  make install
 

That fixed the problem.  So, is this the right solution?  Should configure always
assume --disable-threads when it sees --with-mpm=prefork?

Given that this problem does not show up with low numbers of children, I am kind
of wondering if disabling threads is just covering up some problem that should
not affect prefork.

Thoughts?

-adam



prefork segfaults under load

2002-02-05 Thread Adam Sussman


I'm seeing a lot of error messages like this in my error log under load with
lots of children (1300 or so):

[Tue Feb 05 12:52:17 2002] [notice] child pid 32299 exit signal Segmentation fault 
(11), possible coredump in /tmp
[Tue Feb 05 12:52:17 2002] [notice] child pid 32298 exit signal Segmentation fault 
(11), possible coredump in /tmp
[Tue Feb 05 12:52:17 2002] [notice] child pid 32297 exit signal Segmentation fault 
(11), possible coredump in /tmp
[Tue Feb 05 12:52:17 2002] [notice] child pid 32296 exit signal Segmentation fault 
(11), possible coredump in /tmp
...etc...

This is happening with the prefork mpm and seems to occur only with
new children being spawned by perform_idle_server_maintenance due to a
load spike.  This does not ever happen with low numbers of children.

The interesting thing is that it appears as if the child never makes it
very far after the fork command.  Various debugging code just after the
fork in the child produces nothing.

Below is the stack trace.  At first glance it looks like a parent
trace, but the parent never dies.  It very much looks like the child is
segfaulting from within the fork itself.

Maybe there's some signal wierdness going on here?  The processes which don't
core are often hard to kill off.

#0  pthread_sighandler (signo=11, ctx=
  {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, 
edi = 1074716232, esi = 0, ebp = 3221222696, esp = 3221222656, ebx = 1074726616, edx = 
1, ecx = 0, eax = 0, trapno = 13, err = 0, eip = 1074677988, cs = 35, __csh = 0, 
eflags = 66070, esp_at_signal = 3221222656, ss = 43, __ssh = 0, fpstate = 0x0, oldmask 
= 2147483648, cr2 = 0})
at signals.c:87
#1  signal handler called
#2  __pthread_reset_main_thread () at internals.h:372
#3  0x400e40e5 in __fork () at ptfork.c:92
#4  0x080762e8 in make_child (s=0x80b9378, slot=1343) at prefork.c:770
#5  0x0807671b in perform_idle_server_maintenance (p=0x80b7578) at prefork.c:963
#6  0x08076a7e in ap_mpm_run (_pconf=0x80b7578, plog=0x80e9640, s=0x80b9378) at 
prefork.c:1120
#7  0x0807d5d2 in main (argc=1, argv=0xbb54) at main.c:501
#8  0x4010d177 in __libc_start_main (main=0x807cca4 main, argc=1, ubp_av=0xbb54, 
init=0x805e22c _init, 
fini=0x809e010 _fini, rtld_fini=0x4000e184 _dl_fini, stack_end=0xbb4c) at 
../sysdeps/generic/libc-start.c:129
(gdb)   

Any clues or pointers would be appreciated.

-adam



Re: 2.0.31 shutdown after heavy load + core dumps on heavy load

2002-02-04 Thread Adam Sussman


I'm seeing the same thing with the prefork mpm under linux with lots of load.  
Possibly it
is something outside of the mpms themselves?  The processes definitly don't go away 
until
you kill them with SIGKILL a few times.  One thing I noticed is that, under load, 
processes
seem to spend a lot of time in the 'close connection' state.  By the time you have to 
issue
the kills, mod_status shows idle children.  Perhaps there is something that is not 
getting
cleaned up properly?

On a possibly related note, I am seeing segfaults in prefork children under load when 
the
number of children is high (over 1000).  The stacktrace looks like so and doesn't make
a lot of sense:

#0  pthread_sighandler (signo=11, ctx=
  {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, 
edi = 1076506184, esi = 0, ebp = 3221223720, esp = 3221223680, ebx = 1076516568, edx = 
1, ecx = 0, eax = 0, trapno = 13, err = 0, eip = 1076467940, cs = 35, __csh = 0, 
eflags = 66070, esp_at_signal = 3221223680, ss = 43, __ssh = 0, fpstate = 0xb680, 
oldmask = 2147483648, cr2 = 0}) at signals.c:87
87  signals.c: No such file or directory.
in signals.c
(gdb) where
#0  pthread_sighandler (signo=11, ctx=
  {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, 
edi = 1076506184, esi = 0, ebp = 3221223720, esp = 3221223680, ebx = 1076516568, edx = 
1, ecx = 0, eax = 0, trapno = 13, err = 0, eip = 1076467940, cs = 35, __csh = 0, 
eflags = 66070, esp_at_signal = 3221223680, ss = 43, __ssh = 0, fpstate = 0xb680, 
oldmask = 2147483648, cr2 = 0}) at signals.c:87
#1  signal handler called
#2  __pthread_reset_main_thread () at internals.h:372
#3  0x402990e5 in __fork () at ptfork.c:92
#4  0x0809b2cd in ap_graceful_stop_signalled () at eval.c:41
#5  0x0809b643 in ap_graceful_stop_signalled () at eval.c:41
#6  0x0809ba62 in ap_mpm_run () at eval.c:41
#7  0x080a24ee in main () at eval.c:41
#8  0x402c2177 in __libc_start_main (main=0x80a1b9c main, argc=1, ubp_av=0xbb44, 
init=0x80637a8 _init, 
fini=0x80c2f50 _fini, rtld_fini=0x4000e184 _dl_fini, stack_end=0xbb3c) at 
../sysdeps/generic/libc-start.c:129

Anyone have an idea how to track this down?  The ap_graceful_stop_signalled function
does *nothing* in the prefork mpm.

-adam


On Mon, Feb 04, 2002 at 02:13:52PM -0500, MATHIHALLI,MADHUSUDAN (HP-Cupertino,ex1) 
wrote:
 This was with the worker MPM.. I initially suspected the latency - 'waited
 for about 5 minutes, and then nothing happened till I issued a kill -9
 command. I'll try again today, and will probably post the stack trace of the
 parent process when it happens.
 
 Thanks
 -Madhu
 
 -Original Message-
 From: Greg Ames [mailto:[EMAIL PROTECTED]]
 Sent: Monday, February 04, 2002 9:58 AM
 To: [EMAIL PROTECTED]
 Subject: Re: 2.0.31 shutdown after heavy load
 
 
 MATHIHALLI,MADHUSUDAN (HP-Cupertino,ex1) wrote:
  
  Hi,
  I'm getting the following message when I try to stop apache after
 a
  stress-test on HPUX (using webstone).. Inspite of the SIGKILL message, the
  process does not exit.. A second attempt is successful.. Any clues
 regarding
  what may be happening is appreciated..
  
   [error] child process 5891 still did not exit, sending a SIGKILL.
 
 which MPM?  and have you tried this test with different results before?
 
 I didn't think we could do anything to block SIGKILL.  Could this be just
 dispatching latency?
 
 Greg

-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman
Vidya Media Ventures

[EMAIL PROTECTED]




Re: [PATCH] mod_proxy truncates status line

2001-12-30 Thread Adam Sussman

On Sun, Dec 30, 2001 at 02:58:16PM +0200, Graham Leggett wrote:
 Adam Sussman wrote:
 
  Mod_proxy truncates the status line returned by the proxied
  server.  One character gets snipped off of the end of the
  status line.
 
 Are you 100% sure the buffer is big enough to do this? If the buffer is
 of size len the zero will be written past the end of the buffer.
 

In the current code, len is strlen(buffer) so it can be safely assumed
to be one less than the length of the buffer (provided of course that
ap_proxy_string_read can be trusted).

In any case, the specific setting of a null character in a way that truncates
valid data is not appropriate here.  Buffer and len must be sized appropriatly.
I believe that they are correct.

-adam


  Index: modules/proxy/proxy_http.c
  ===
  RCS file: /home/cvspublic/httpd-2.0/modules/proxy/proxy_http.c,v
  retrieving revision 1.114
  diff -u -r1.114 proxy_http.c
  --- proxy_http.c19 Dec 2001 16:32:01 -  1.114
  +++ proxy_http.c29 Dec 2001 00:12:21 -
  @@ -689,7 +689,7 @@
   server: , buffer, NULL));
   }
   backasswards = 0;
  -buffer[--len] = '\0';
  +buffer[len] = '\0';
  
   buffer[12] = '\0';
   r-status = atoi(buffer[9]);
 
 Regards,
 Graham
 -- 
 -
 [EMAIL PROTECTED]  There's a moon
   over Bourbon Street
   tonight...


-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman
Vidya Media Ventures

[EMAIL PROTECTED]




Re: [PATCH] mod_proxy infinite cpu eating loop

2001-12-29 Thread Adam Sussman


hmm, so I tried out this patch and found that it does work correctly
for most cases and it does solve the original infinite loop problem.
However, it appears to have introduced a new infinite loop problem
as well as some truncation of proxy data.

Once status and header data have been read (or attempted to be read
in the case of HTTP/0.9), mod_proxy is busy waiting for body content.
This shows up as 100% cpu on my setup.  The loop where this is happening
is based on a non-blocking call to ap_get_brigade() in proxy_http.c:856.
Can anyone tell me why this call should not block?

In the case of a HTTP/0.9 response, the line feed on the first line
(where status is tested) is eaten and never shows up in the output.
I suspect that is because of ap_rgetline().

Lastly, proxy_ftp also uses ap_proxy_string_read and will need to be
dealt with if we trash that function.

-adam

On Sat, Dec 29, 2001 at 08:02:32AM -0500, Bill Stoddard wrote:
 I spent a bit of time looking at this one and I am pretty sure this is not the right
 patch. The problem is that ap_proxy_string_read() is completely broken. Among other
 things, it completely chokes if the 'string' spans multiple brigades.
 ap_proxy_string_read should be trashed and something like this patch should be used
 instead (not tested):
 
 Index: proxy_http.c
 ===
 RCS file: /home/cvs/httpd-2.0/modules/proxy/proxy_http.c,v
 retrieving revision 1.114
 diff -u -r1.114 proxy_http.c
 --- proxy_http.c 19 Dec 2001 16:32:01 - 1.114
 +++ proxy_http.c 29 Dec 2001 12:57:09 -
 @@ -657,6 +657,22 @@
  while (received_continue) {
  apr_brigade_cleanup(bb);
 
 +while ((len = ap_getline(buffer, sizeof(buffer), rp, 0)) = 0) {
 +if (len  0) {
 +/* return status... what? timeout? connection dropped?
 + * for now, just use what was returned in the original broken code
 + * set rp-aborted?
 + */
 +apr_socket_close(p_conn-sock);
 +backend-connection = NULL;
 +ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r,
 +  proxy: error reading status line from remote 
 +  server %s, p_conn-name);
 +return ap_proxyerror(r, HTTP_BAD_GATEWAY,
 + Error reading from remote server);
 +}
 +}
 +#if 0
  if (APR_SUCCESS != (rv = ap_proxy_string_read(origin, bb, buffer, 
sizeof(buffer),
 eos))) {
  apr_socket_close(p_conn-sock);
  backend-connection = NULL;
 @@ -667,7 +683,7 @@
   Error reading from remote server);
  }
  len = strlen(buffer);
 -
 +#endif
 /* Is it an HTTP/1 response?
  * This is buggy if we ever see an HTTP/1.10
  */
 
 - Original Message -
 From: Adam Sussman [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Friday, December 28, 2001 8:24 PM
 Subject: [PATCH] mod_proxy infinite cpu eating loop
 
 
 
  ap_proxy_string_read currently goes into an infinite loop when the proxied server
  closes the connection without sending any data.  This patch fixes the problem
  but I am not sure that this is the right way to do it.
 
  -adam
 
 
  Index: modules/proxy/proxy_util.c
  ===
  RCS file: /home/cvspublic/httpd-2.0/modules/proxy/proxy_util.c,v
  retrieving revision 1.73
  diff -u -r1.73 proxy_util.c
  --- modules/proxy/proxy_util.c 28 Nov 2001 21:07:32 - 1.73
  +++ modules/proxy/proxy_util.c 29 Dec 2001 00:14:18 -
  @@ -1039,6 +1039,7 @@
APR_BUCKET_REMOVE(e);
apr_bucket_destroy(e);
}
  +if (APR_BRIGADE_EMPTY(bb)) break;
   }
 
   return APR_SUCCESS;
 



-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman
Vidya Media Ventures

[EMAIL PROTECTED]




[PATCH] mod_proxy segfault

2001-12-28 Thread Adam Sussman


This patch addresses a segmentation fault that occurs in mod_proxy when the
proxied server returns either a bogus header line or a HTTP/0.9 response.

-adam


Index: modules/proxy/proxy_http.c
===
RCS file: /home/cvspublic/httpd-2.0/modules/proxy/proxy_http.c,v
retrieving revision 1.114
diff -u -r1.114 proxy_http.c
--- proxy_http.c19 Dec 2001 16:32:01 -  1.114
+++ proxy_http.c29 Dec 2001 00:10:46 -
@@ -801,7 +801,7 @@
 /* Is it an HTTP/0.9 response? If so, send the extra data */
 if (backasswards) {
 apr_ssize_t cntr = len;
-e = apr_bucket_heap_create(buffer, cntr, 0);
+e = apr_bucket_heap_create(buffer, cntr, 1);
 APR_BRIGADE_INSERT_TAIL(bb, e);
 }



[PATCH] mod_proxy truncates status line

2001-12-28 Thread Adam Sussman


Mod_proxy truncates the status line returned by the proxied
server.  One character gets snipped off of the end of the
status line.

-adam

Index: modules/proxy/proxy_http.c
===
RCS file: /home/cvspublic/httpd-2.0/modules/proxy/proxy_http.c,v
retrieving revision 1.114
diff -u -r1.114 proxy_http.c
--- proxy_http.c19 Dec 2001 16:32:01 -  1.114
+++ proxy_http.c29 Dec 2001 00:12:21 -
@@ -689,7 +689,7 @@
 server: , buffer, NULL));
 }
 backasswards = 0;
-buffer[--len] = '\0';
+buffer[len] = '\0';
 
 buffer[12] = '\0';
 r-status = atoi(buffer[9]);



Stability problems in mod_proxy

2001-12-18 Thread Adam Sussman

-- 

I believe in Kadath in the cold waste, and Ultima Thule. But you
 cannot prove to me that Harvard Law School actually exists.
- Theodora Goss

I'm not like that, I have a cat, I don't need you.. My cat, and
 about 18 lines of bourne shell code replace you in life.
- anonymous


Adam Sussman

[EMAIL PROTECTED]