On Tue, 29 Jun 1999, Ingo Molnar wrote:

> to be fair to NT, NT still performs better than Linux if only a few static
> pages are served - but thats just a small part of the picture. Especially
> when it comes to anything else than serving a few static files: 
> 
>    In this setup, the freeware system clearly shows better results: While
>    NT can hardly manage more than 30 requests per second, Linux can
>    handle more than 166. With 512 client processes, it even manages 274
>    pages per second. [...]
> 
> note that the above results are _still_ static files, just a little bit
> more complex and more RL set of files and set of requests. When doing CGI:
> 
>    During the CGI tests, our NT server suffered massive performance
>    losses. Although both NT and Linux serve more than twice as many
>    dynamic pages with four CPUs as they do with one, NT in SMP mode is
>    still just under half as fast as Linux with only one CPU. [...]
> 
> probably the most interesting thing c't found was that a threaded web
> server like IIS has serious design flaws wrt. dynamic execution:
> 
>    NT can't do more than seven pages per second. This is probably where
>    IIS design comes into play, which unlike Apache works with threads.
>    Normally, web servers with internal threads are meant to work more
>    efficiently due to the reduced system overhead. However, once all
>    threads are blocked, the server cannot process any further requests.
>    [...]
> 
> there is still work to be done to get Linux perform even better, and this
> work is going on as we speak. Many thanks to c't for pointing out sore
> spots.
> 
> -- mingo

I'm sorry to be coming in on this thread so late, but I've been on
vacation.  A question:  Has any effort gone into analyzing the
bottlenecks associated with the c't results?  If I understand the
article correctly, Linux is generally superior to NT until multiple
ethernet interfaces come into play.  I'm having a hard time reconciling
the single interface results (figure "1") which show the two OS's going
nearly perfectly together and the multiple interfaces figure.  

In particular, it looks like Linux is getting only 30% or so of the
benefit of a second interface, while NT is getting around 110% of the
benefit of a second interface.  That is, NT with two interfaces is
slightly >>more<< than twice as fast as NT with only one, while Linux is
apparently choking on the second interface so that the per-interface
efficiency is dropping to only 65% of the single interface level.  This
makes little sense to me in either case.

It is >>very<< hard to see why NT (or anything else) would get a greater
than linear gain from additional interfaces unless it is playing some
peculiar trick that benefits from an accident in the test configuration.
It is also very hard to see why Linux degrades to only 65% of its
per-interface performance with a second interface when the total
requests/second are so far from saturation of a 100BT interface.  

In the single interface graph, it looks like at most 4 MB/sec are being
moved by either NT or Linux on an interface capable of at most around 12
MB/sec, probably three packets at a time (4K broken up into <1.5K
packets) plus the http request packet.  This is not particularly close
to saturation of the server interface, but it might well create problems
at the switch -- multiple packets bound for the server are almost
certainly being managed in by the switch simultaneously -- and c't
doesn't mention the switch(es) that were used.  I also wish very much
that c't had provided the server CPU load average(s) on the same figure
in parallel with the responses/second, if not even more information.

The SMP results and text suggest that the result is not likely to be CPU
bound but is rather likely to be saturated "elsewhere".  But where?  If
the c't assertion that it is saturated at the client/network level is
indeed correct, then their whole test becomes invalid as then it isn't
testing the server per se at all.  c't should have used
better/faster/more clients (better than 486's, for sure) and probably
should have introduced some deliberate "noise" into the request pattern
timing from each client to avoid getting accidental synchronizations of
client/server/switch response patterns (properly averaging over switch
contention, if you like).

However, if the single interface results are indeed not CPU bound, it
becomes very difficult indeed to understand the multiple interface
results.  How in the world does NT do LESS serial CPU work per request
served on a single CPU with two interfaces to manage and still benefit
from multiple CPUs?  Recalling that a single interface by itself didn't
benefit from multiple CPUs, how can it reap a CPU benefit from multiple
interfaces and still reap a second benefit from multiple CPUs?  We are
forced to conclude that NT running two interfaces on two CPUs SMP is
more efficient than NT running one interface on one CPU, two times.
This does not compute.  There is an accidental synchronization or
unexpected bottleneck somewhere that is affecting these results.

The linux results are almost as puzzling.  I can believe a drop off in
average performance with the addition of a second network interface, but
to 65% of the single interface value fairly far from interface
saturation?  This is worse than I would have expected even for a 2.0.x
kernel.  This somehow suggests that the server is CPU and/or strongly kernel
bound.  The crossover evident as the clients go from 32 to 64 processes
suggest that at 32 the bound is kernel, and above it is CPU (at least,
from the graph, there is some benefit from more CPUs that tails over as
one expects).

This is still not very consistent as a picture, though.  There are a
number of other puzzling features of the performance graph that make me
doubt the linux results almost as much as I doubt the NT results.  There
is something else going on here -- I don't believe that they are
measuring what they think that they are measuring.

Things I'd like to see to clarify this:

  a) Plots of the various performance measures that may be contributing
to the bottlenecks at each point on the same (or parallel) graphs.  I
particular, I'd like to see at LEAST load average and the number of
packets/second being received and transmitted, per interface.  Memory
utilization and disk activity would also be useful, although perhaps
harder to quantify/average and include.

  b) Some effort made to properly determine the CLIENT contribution to
the overall performance and ensure that it is unbiased.  It is not
enough to say that the clients are "identical in both cases" because
there can be accidental synchronicities in network response -- induced
antibunching of the client request patterns, if you like -- that negate
the obvious assumption that the request pattern is poissonian and hence
ignorable and averageable.  At the very least, I'd use clients whose
network interfaces are not themselves saturated in any dimension by the
test and I'd introduce deliberate random delays into the client test
pattern to at least try to force a poisssonian distribution of incoming
packets.  Humans are so slow that they are effectively poissonian, but
computerized loops are both fast and regular -- the test conditions are
intended to emulate requests being made randomly by humans from far more
interfaces and with far more random delays.

  c) Similarly, I'd use a switch that is not saturated in any dimension
by the test, and I'd determine this with independent (non-web-based)
tests.  Or I'd use several switches and identify them by name.  I'd
probably also use switched 100BT on the (fast!) clients to really
determine how much of the measured server performance is related to 10BT
bottlenecks on the 10BT/client side and has nothing to do with the
server.  For all we know, the first figure especially may be measuring
NOTHING but switch performance and may have little or nothing to do with
the servers.

With all that said, I do think that the c't tests are far better
conceived and executed than the Mindcruft results.  I hope that they
don't leave the issue where it is left but rather pursue it still
further to address some of the points above.  It may well be that NT
with two interfaces performs as they describe while linux "flops" with
multiple interfaces, but I'd like very much to see the point of failure
neatly isolated and the superlinear interface speedup observed in NT
explicated.

   rgb

Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]



-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to