I did some performance measurements serving static files with Apache 2.0 on
AIX a few weeks back. I found that the static page serving performance was
roughly equal to what I was seeing with Apache 1.3 for small numbers of
concurrent clients (~30 or so). My instincts tell me we are loosing a lot in
malloc/free and locking but gaining it back with sendfile(). Maybe completely
wrong here as you never know until you profile it :-)
The BIG win is in concurrent client support for a given amount of RAM. The
Apache 2.0 threaded MPM can support many MANY more concurrent clients before
we start swapping. This is not suprising through...
Bill
> Hi guys,
> I thought this may be of interest to the mailing list.
> ..Ian
>
> > -----Original Message-----
> > From: Brian Pane
> > Sent: Wednesday, February 28, 2001 9:33 PM
> > To: Ian Holsman
> > Subject: some reasons why Apache 2.0 threaded is slower than prefork
> >
> >
> > Ian--
> >
> > I found one of the big reasons why 2.0 in
> > threaded mode is a lot slower than prefork.
> > You may want to forward this info on to
> > new-httpd if you find it interesting.
> >
> > I started up your "threaded_cvs" version
> > of Apache on port 8080 on web3
> > and ran "truss -c" on the child process
> > (yes, child process, rather than processes;
> > it looks like the parent may have killed
> > off the rest because the load was light?)
> > while I used "ab" to hit the server with 20
> > concurrent requests.
> >
> > Attached is the output from truss, showing the
> > total time spent in various sytem calls.
> >
> > The poor thing is spending most of its
> > time in mutex operations. :-(
> >
> > I then ran gdb on it, set a breakpoint on
> > pthread_mutex_lock, and found some of the
> > places where it's trying to grab a lock:
> > 1. ap_read_request->apr_pool_create
> > 2. ap_bucket_socket_create->calloc
> > 3. ap_http_filter->socket_read->malloc
> > 4. ap_http_filter->socket_read->
> > apr_bucket_head->malloc
> > 5. a dozen more calls to malloc and
> > free during ap_read_request alone,
> > but I gave up trying to document them
> > all
> >
> > I'm surprised that it's doing so much malloc/free
> > work (rather than using a pool allocator). At least
> > on Solaris, all the malloc/free activity is going to
> > be a huge bottleneck, both because it uses a lot of
> > CPU and because of the mutexes in malloc and free.
> >
> > --Brian
> >
> > syscall seconds calls errors
> > _exit .00 1
> > read 2.99 2173 560
> > write .86 554
> > open 1.64 1102
> > close 2.12 1633
> > time .16 208
> > brk .11 224
> > stat 1.92 1616 1
> > getpid .00 1
> > kill .00 1
> > ioctl .93 530 530
> > fcntl 2.15 2207
> > poll .92 1093
> > sigaction .00 4
> > waitid .00 33 25
> > mmap .67 550
> > mprotect .03 20
> > munmap .58 530
> > writev 1.05 553
> > fork1 .08 7
> > sigtimedwait .00 1
> > lwp_sema_wai 1.96 1149
> > lwp_sema_pos 1.54 1149
> > lwp_create .06 40
> > lwp_continue .04 20
> > lwp_mutex_un 12.46 11929
> > lwp_mutex_lo 11.61 9020
> > lwp_cond_wai .04 188
> > lwp_cond_sig .00 40
> > llseek 1.27 531
> > lwp_schedctl .00 20
> > signotifywai .10 22
> > lwp_sigredir .05 21
> > fstat64 1.88 1592
> > open64 .66 530 530
> > accept .59 531
> > shutdown .56 531
> > getsockname .69 531
> > getsockopt 1.21 1062
> > setsockopt 1.17 1062
> > ---- --- ---
> > sys totals: 52.10 43009 1646
> > usr time: 9.41
> > elapsed: 53.36
> >
>