Re: 3.0, the 2011 thread.

Graham Leggett Wed, 15 Jun 2011 16:34:57 -0700

On 16 Jun 2011, at 12:01 AM, Paul Querna wrote:

I think we have all joked on and off about 3.0 for... well about 8years now.
I think we are nearing the point we might actually need to beserious about it.
The web is changed.

SPDY is coming down the pipe pretty quickly.

WebSockets might actually be standardized this year.

Two protocols which HTTPD is unable to be good at. Ever.

The problem is our process model, and our module APIs.


I am not convinced.

Over the last three years, I have developed a low level stream servingsystem that we use to disseminate diagnostic data across datacentres,and one of the basic design decisions was that it was to be lock freeand event driven, because above all it needed to be fast. The eventdriven stuff was done properly, based on religious application of thefollowing rule:

"Thou shalt not attempt any single read or write without the eventloop giving you permission to do that single read or write first. Nota single attempt, ever."


From that effort I've learned the following:

- Existing APIs in unix and windows really really suck at non blockingbehaviour. Standard APR file handling couldn't do it, so we couldn'tuse it properly. DNS libraries are really terrible at it. The vastmajority of "async" DNS libraries are just hidden threads which wrapattempts to make blocking calls, which in turn means unknown resourcelimits are hit when you least expect it. Database and LDAP calls areblocking. What this means practically is that you can't link to mostsoftware out there.

- You cannot block, ever. Think you can cheat and just make a cheekyattempt to load that file quickly while nobody is looking? Your harddisk spins down, your network drive is slow for whatever reason, andyour entire server stops dead in its tracks. We see this choppybehaviour in poorly written user interface code, we see the samechoppy behaviour in cheating event driven webservers.

- You have zero room for error. Not a single mistake can be tolerated.One foot wrong, the event loop spins. Step one foot wrong the otherway, and your task you were doing evaporates. Finding these problemsis painful, and your server is unstable until you do.

- You have to handle every single possible error condition. Everysingle one. Miss one? You suddenly drop out of an event handler, andyour event loop spins, or the request becomes abandoned. You have noroom for error at all.

We have made our event driven code work because it does a number ofvery simple and small things, and it's designed to do these simple andsmall things well, and we want it to be as compact and fast as humanlypossible, given that datacentre footprint is our primary constraint.

Our system is like a sportscar, it's fast, but it breaks down if webreak the rules. But for us, we are prepared to abide by the rules toachieve the speed we need.


Let's contrast this with a web server.

Webservers are traditionally fluid beasts, that have been and continueto be moulded and shaped that way through many many ever changingrequirements from webmasters. They have been made modular andextensible, and those modules and extensions are written by peoplewith different programming ability, to different levels of tolerances,within very different budget constraints.

Simply put, webservers need to tolerate error. They need to be builtlike tractors.

Unreliable code? We have to work despite that. Unhandled errorconditions? We have to work despite that. Code that was written in ahurry on a budget? We have to work despite that.

Are we going to be sexy? Of course not. But while the sportscar isbroken down at the side of the road, the tractor just keeps going.

Why does our incredibly unsexy architecture help webmasters? Becauseprefork is bulletproof. Leak, crash, explode, hang, the parent willclean up after us. Whatever we do, within reason, doesn't affect theprocess next door. If things get really dire, we're delayed for awhile, and we recover when the problems pass. Does the server die?Pretty much never. What if we trust our code? Well, worker may helpus. Crashes do affect the request next door, but if they're rareenough we can tolerate it. The event mpm? It isn't truly an even mpm,it is rather more efficient when it comes to keepalives and waitingfor connections, where we hand this problem to an event loop thatdoesn't run anyone else's code within it, so we're still reliabledespite the need for a higher standard of code accuracy.

If you've ever been in a situation where a company demands more speedout of a webserver, wait until you sacrifice reliability giving themthe speed. Suddenly they don't care about the speed, reliabilitybecomes top priority again, as it should be.

So, to get round to my point. If we decide to relook at thearchitecture of v3.0, we should be careful to ensure that we don'tstop offering a "tractor mode", as this mode is our killer feature..There are enough webservers out there that try to be event driven andsexy, and then fall over on reliability. Or alternatively, there arewebservers out there that try to be event driven and sexy, and succeedat doing so because they keep their feature set modest, keepextensibility to a minimum and avoid touching blocking calls to disksand other blocking devices. Great for load balancers, not so great foranything else.

Apache httpd has always had at it's heart the ability to bepractically extensible, while remaining reliable, and I think weshould continue to do that.


Regards,
Graham
--

Re: 3.0, the 2011 thread.

Reply via email to