Hi there,

On Mon, 29 Jan 2001 [EMAIL PROTECTED] wrote:

> I don't want to prolong this discussion longer than necessary, however "a
> large webfarm with really good loadbalancing" indicates you are running
> several servers and a load balancer. If we are looking at an individual
> server (which of course we must be if we are talking about a PCI card) then
> my comments still stand, that the crypto card is working many times faster
> than the processor can at present. However, real on-chip support for SSL
> encryption would be quicker still, but we don't have that. IMHO the card is
> better value for money than extra servers etc.

Depends. An Athlon 1Ghz CPU can, on its own, crank out nearly 200 1024-bit
RSA private key operations a second. So using a dual CPU machine
effectively gives you the extra "grunt" (read: capacity) you can get from
a 200 ops/sec card *except* that the second CPU can share in all duties
with the first CPU, rather than segregating the roles (ie. this CPU does
*only* crypto, and this CPU does *only* everything else). I would guess
that a decent motherboard, 256Mb RAM, and 2 to 4 1Ghz CPUs (or
thereabouts) may seem on the surface to be overkill. But when you line the
costs up, in reality you may find that not so pointless. And remember,
accelerator cards generally can't run PHP, mod_perl, java etc - whereas an
extra CPU in your host *can*.

However, purism aside ... :-) ... what is more important in this whole
equation is Apache - it is an absolute *PIG* for SSL acceleration.
Although your host machine may be gaining a lot in the way of free CPU
cycles by offloading to the card, it is gaining a penalty of a different
kind. Apache obviously can't do non-blocking IO for parallelism which is a
shame. Apache (as released) can't yet do multi-threading either which
would otherwise at least lower the overhead of parallelism, which is also
a shame. So, you have a situation where each individual stream occupies a
dedicated process for its duration (which is why keepalives should never
be turned on in Apache - it becomes almost trivial to DoS an apache server
with any crusty old 8-bit from the 80's and a TCP/IP stack). The point is
that although your host machine's CPU is "idle" whilst a crypto operation
is taking place on the card, thus leaving you the impression that your
machine has more time/capacity for other things, you have however lost an
important resource - a process. That process can not and will not get on
with anything else whilst the crypto operation is happening (or in transit
between the kernel to/from the card). To actually use that "spare time" on
the host machine adequately requires you to run more processes
concurrently, given that many of them will spend a reasonable proportion
of their time "blocked". Hence my sarcastic comments about threading and
non-blocking IO - if the latter was available this wouldn't be a problem
at all. Processes are not free.

In many ways, in Apache the best cost-performance equation is to just get
the SSL done in-line as fast as possible. Your processes are committed to
their individual client connections anyway, and offloading the crypto
elsewhere (with inherent latency) only gains you time+resources on the
host machine if you're prepared to maintain a lot more concurrent
processes - ie. parallelism to counter the latency ... (which in turn
starts to get the kernel and the system in general a bit bogged down).

If I hadn't already made my point clearly (as I'm often struggling to do
:-), Apache is not latency friendly. You have the same architectural
headaches if you're using back-end databases or application servers where
the back-end requests from the webserver have high latency - you end up
needing truck-loads of httpd processes running just to maintain a modest
work-load. On most modern (but modest) hardware running most modern
operating systems, performance ceases to scale linearly (at least ceases
to do so visibly) anywhere in the ballpark of 50-200 processes. If you get
200 processes running on a standard intel/linux setup, I think your pure
throughput capacity will be a lot lower than say with 50 processes. The
fact that operations occasionally require synchronisation (eg. for access
to a PCI device, shared memory segment, disk activity, whatever) combined
with the fact that the kernel has to handle all the scheduling, network
I/O, etc for every one of those processes - it all adds up and at some
point that overhead of parallelism itself starts to eat away at your
application's performance. Note also that Linux has a pretty "thin" fork()
compared to some other systems ... But the problem is that if your
architecture has latency (eg. offloading crypto, using external databases,
nfs-mounted files, etc) you may have no choice but to use more processes.
So, conceptually at least, it makes sense that if you *have* fork()ed off
httpd processes, its simply best to keep them as busy as possible all the
time to ensure the resource itself (and its affect on kernel/system
overheads) is not wasted. This adds to the argument for using multiple
CPUs rather than dedicated acceleration hardware. Then again, it all
depends upon your numbers, your applications, and your needs.

> However, it wouldn't be so much of an issue if IE didn't exist! 

?? Didn't follow that sorry.

Cheers,
Geoff

______________________________________________________________________
Apache Interface to OpenSSL (mod_ssl)                   www.modssl.org
User Support Mailing List                      [EMAIL PROTECTED]
Automated List Manager                            [EMAIL PROTECTED]

Reply via email to