Re: some reasons why Apache 2.0 threaded is slower than prefork

Bill Stoddard Thu, 01 Mar 2001 13:24:11 -0800
I did some performance measurements serving static files with Apache 2.0 on
AIX a few weeks back.  I found that the static page serving performance was
roughly equal to what I was seeing with Apache 1.3 for small numbers of
concurrent clients (~30 or so).  My instincts tell me we are loosing a lot in
malloc/free and locking but gaining it back with sendfile(). Maybe completely
wrong here as you never know until you profile it :-)

The BIG win is in concurrent client support for a given amount of RAM.  The
Apache 2.0 threaded MPM can support many MANY more concurrent clients before
we start swapping. This is not suprising through...

Bill

> Hi guys,
> I thought this may be of interest to the mailing list.
> ..Ian
>
>  > -----Original Message-----
>  > From: Brian Pane
>  > Sent: Wednesday, February 28, 2001 9:33 PM
>  > To: Ian Holsman
>  > Subject: some reasons why Apache 2.0 threaded is slower than prefork
>  >
>  >
>  > Ian--
>  >
>  > I found one of the big reasons why 2.0 in
>  > threaded mode is a lot slower than prefork.
>  > You may want to forward this info on to
>  > new-httpd if you find it interesting.
>  >
>  > I started up your "threaded_cvs" version
>  > of Apache on port 8080 on web3
>  > and ran "truss -c" on the child process
>  > (yes, child process, rather than processes;
>  > it looks like the parent may have killed
>  > off the rest because the load was light?)
>  > while I used "ab" to hit the server with 20
>  > concurrent requests.
>  >
>  > Attached is the output from truss, showing the
>  > total time spent in various sytem calls.
>  >
>  > The poor thing is spending most of its
>  > time in mutex operations.  :-(
>  >
>  > I then ran gdb on it, set a breakpoint on
>  > pthread_mutex_lock, and found some of the
>  > places where it's trying to grab a lock:
>  >     1. ap_read_request->apr_pool_create
>  >     2. ap_bucket_socket_create->calloc
>  >     3. ap_http_filter->socket_read->malloc
>  >     4. ap_http_filter->socket_read->
>  >        apr_bucket_head->malloc
>  >     5. a dozen more calls to malloc and
>  >        free during ap_read_request alone,
>  >        but I gave up trying to document them
>  >        all
>  >
>  > I'm surprised that it's doing so much malloc/free
>  > work (rather than using a pool allocator).  At least
>  > on Solaris, all the malloc/free activity is going to
>  > be a huge bottleneck, both because it uses a lot of
>  > CPU and because of the mutexes in malloc and free.
>  >
>  > --Brian
>  >
>  > syscall      seconds   calls  errors
>  > _exit            .00       1
>  > read            2.99    2173    560
>  > write            .86     554
>  > open            1.64    1102
>  > close           2.12    1633
>  > time             .16     208
>  > brk              .11     224
>  > stat            1.92    1616      1
>  > getpid           .00       1
>  > kill             .00       1
>  > ioctl            .93     530    530
>  > fcntl           2.15    2207
>  > poll             .92    1093
>  > sigaction        .00       4
>  > waitid           .00      33     25
>  > mmap             .67     550
>  > mprotect         .03      20
>  > munmap           .58     530
>  > writev          1.05     553
>  > fork1            .08       7
>  > sigtimedwait     .00       1
>  > lwp_sema_wai    1.96    1149
>  > lwp_sema_pos    1.54    1149
>  > lwp_create       .06      40
>  > lwp_continue     .04      20
>  > lwp_mutex_un   12.46   11929
>  > lwp_mutex_lo   11.61    9020
>  > lwp_cond_wai     .04     188
>  > lwp_cond_sig     .00      40
>  > llseek          1.27     531
>  > lwp_schedctl     .00      20
>  > signotifywai     .10      22
>  > lwp_sigredir     .05      21
>  > fstat64         1.88    1592
>  > open64           .66     530    530
>  > accept           .59     531
>  > shutdown         .56     531
>  > getsockname      .69     531
>  > getsockopt      1.21    1062
>  > setsockopt      1.17    1062
>  >                 ----     ---    ---
>  > sys totals:    52.10   43009   1646
>  > usr time:       9.41
>  > elapsed:       53.36
>  >
>
Re: some reasons why Apache 2.0 threaded is slower than prefork

Reply via email to