On Mon, 2006-08-07 at 12:19 -0700, Brian McCallister wrote: > I am trying to understand why Mongrel so forcefully disables http > pipelining. The docs say because the spec is unclear, and it hurts > performance. These reasons smell... wrong. The HTTP spec is pretty > clear, and, er, I cannot find anywhere else that claims there is a > performance drawback, and lots of studies (and personal benchmarks > across years of writing webapps) showing how much it helps. >
The problem is performance, resources, and usage related. First, Ruby's IO subsystem isn't that great at processing HTTP style protocols since you have to parse off chunks, then parse more chunks, and since there's no decent ring buffer it requires tons of string creation. I've worked on this a bit but it's a real pain so I focused on just making Mongrel work well in the simple case. Second, Ruby only has 1024 file descriptors open for *all* files, in practical usage on a Rails server this is about 500 sockets before the server tanks badly. Allowing clients to keep sockets open means that clients very easily crash Ruby by just not closing them off. As it is now Mongrel has to boot clients that take too long in order to keep service levels high. It would get much more complex in a pipeline/keepalive situation where the sockets are kept open. Throw in threading issues around Rails, socket and file usage by random authors, and problems with how pipelined resources are dealt with (not by mongrel, but by the frameworks) and you've got a total mess. It's just simpler to process one request and go away. Third, Mongrel is more often used behind another more capable server and off localhost. Mongrel's not intended to be a full blown server, but rather just small enough and fast enough to get a Ruby application going. Rather than waste a lot of resources on making Mongrel handle all the nuances of the HTTP RFC I went and implemented what worked the fastest in *this* situation. This also indirectly helps with a common queuing problem when a series of pipeline requests cause one backend to be taken over by a client, thus shutting out many others. It turns out that when you're in a clustering situation most of the requests Mongrel handles are better done being sprayed around to multiple servers so that all clients get a fair chance at service. Those would be the reasons right now. Things may change in the future when the technology landscape for Mongrel changes, but until then it's enough work to just get this simplest case going well. > The only common case I can think of for getting a possible > performance boost from forcing a connection close is if you with > certainty that there are no followup resource requests to the same > domain, and the cost of maintaining connection state in memory is too > high for the app server. This holds true for folks like Yahoo! or > whatnot who use a CDN for resources (and use pipelining on the CDN > connections) and separate app servers for the dynamic page elements, > but... it seems to be a strange assumption for a web server to force > on users. > Again, forcing the connection closed works better in this situation because it's expected to work on localhost, and there's not a statistical difference in performance in that situation. But, if you're reading through the spec you might be able to help me out, since I'm writing a test suite for this very purpose (and then exploits around it). If you can, help me find the answers to these: 1) Can a client perpetually send pipelined requests eating up available socket descriptors (remember, ruby's only got 1024 available FDs, about 500 sockets in practical usage)? 2) Can a client send 20 or 30 requests right away and not process any responses, and then suddenly close? 3) Can a client "trickle" requests (send them very slowly and very chunked) in such a way that the server has to perform tons of processing? 4) Who closes? It's not clear if the client closes, the server closes, who's allowed to close, when, what situations. This is really unclear but incredibly important in a TCP/IP protocol, and in the HTTP RFC it's hidden in little SHOULD and MAY statements in all sorts of irrelevant sections. 5) What are the official size limits of each element of HTTP? Can someone send a 1M header element? 6) Why are servers required to be nice to malicious clients? All over the spec are things where the server is required to read all of the client's garbage, and then politely return an error. With DDoS you'd think this would change. So when is it appropriate for a server to be mean in order to protect itself? 7) What's the allowed time limit for a client to complete it's request? 8) Are pipelined requests all sent at once, and then all processed at once? Or, are they sent/processed/sent/processed in keeping with HTTP request/response semantics. a) If a client can pipeline 20 requests, but request #3 causes an error that requires the client be closed, does the server have to process the remaining 17 before responding (see #6). b) If a client does request/response then why have pipeline at all? c) How does a client make 20 requests, and then after getting #6 abort the remaining 13? d) What does the server do with all the resources it's gathered up if the socket is closed? e) The server can't just start sending since client receive buffers and server send buffers are finite and set by the OS. If this is the case, then either the server has to queue up all response and send when the client is done, or the client has to do request/response. d) If they do request/response, how do they synchronize the processing? It's a catch-22 if you say they can send 20 pipelined requests, but in actuality due to send/recv buffers they have to also process requests at the same time. Without a clear decision on this it's very difficult and pretty much either side can just stop processing without the other side knowing. 9) If both sides just keep sockets open and process whatever comes their way, then what prevents a malicious client or server from doing nothing and eating up resources. 10) If there's pipelined requests and responses then why is there chunked encoding, multipart mime, byte ranges, and other mechanisms for doing nearly the same thing. 11) If it's not explicitly declared that both sides will pipeline, and neither side needs to declare the size of it's content, then what prevents both sides from sending tons of junk? How does either side really know the end of a request? That's from my latest notes. As you can see most of the problems encountered tend to come from a lack of clarity in the areas of: * Asynchronous vs. Synchronous processing. * Request/Response vs. Batch vs. spray and pray. :-) * Abuse of resources by clients. * Changes in the technology landscape since 1999 that makes it so that servers are at a major disadvantage (DDoS baby). * A lack of understanding of the needs for web applications like Mongrel which typically run on localhost or highly controlled networks where much of this isn't necessary and only adds complexity. * Not anticipating that the *real* performance problem in web applications is *not* TCP/IP connection times, but rather the slow nature of dynamic page generation (can we get something other than Etag please?) > Anyway, trying to understand why it works this way. Anyone know? Yeah, you know what we should do, and you might get a kick out of this, but I'm working on a test suite in RFuzz that's exploring all the parts of the RFC. I've got sections 3 and 4 laid out and ready to be filled in with more to come. It basically goes through each part and makes sure a server is compliant. I'm also working up attacks and DDoS operations that exploit the ambiguous parts of the RFC using RFuzz. If you want, hook up with me off list and maybe we can fill out the RFuzz test suite that does this part of the RFC, then work out the exploits, *then* beef up Mongrel to deal with it. Could be fun. -- Zed A. Shaw http://www.zedshaw.com/ http://mongrel.rubyforge.org/ http://www.railsmachine.com/ -- Need Mongrel support? _______________________________________________ Mongrel-users mailing list [email protected] http://rubyforge.org/mailman/listinfo/mongrel-users
