Hi Eoghan, Comments inline...
On 2/20/07, Glynn, Eoghan <[EMAIL PROTECTED]> wrote:
> -----Original Message----- > From: Dan Diephouse [mailto:[EMAIL PROTECTED] > Sent: 19 February 2007 19:12 > To: [email protected] > Subject: Problems with Chunking > > Hi All, > > I did some debugging over the weekend with a user and ASP.NET > seems to have problems if chunking isn't on. Here is the > response that comes when it is turned on: > > HTTP/1.1 400 Bad Request > Server: ASP.NET <http://asp.net/> Development Server/8.0.0.0 > Date: Sat, 17 Feb 2007 07:55:29 GMT > X-AspNet-Version: 2.0.50727 > Cache-Control: private > Content-Length: 0 > Connection: Close > > It works fine however if chunking is turned off. There are > other servers as well that don't work with chunking, which is > why we ultimately turned off chunking. > > I want to suggest that either > > a) We turn off chunking by default. > b) We have some threshold for chunking. For instance, first > we stream up to 100K to a byte[] buffer. If there is still > more to write, we write the buffer and the rest of the > request as a chunked request. Otherwise it is written as a > non-chunked request. Well the problem with this approach is what happens if the request is >100k and the server-side happens to be ASP.NET? Since we fallback to chunking once the 100k threshold is reached, presumably the server-side will barf and we're back where we started. So I don't really like the idea of a band-aid that will work some of the time, but allow the old problem to creep back in when there's an unexpectedly large outgoing request. Ironically, we had a long discussion on this list some time back, with a lot of opposition expressed to the way the HTTP wrapper output stream buffers up the request payload up to the first flush(), so as to allow headers to be set by interceptors after the first payload write may have occurred.
I don't think its so ironic. My objection was on the server side. If you recall I want the ability to do writes without creating new buffers so we can do efficient XML routing. This was (is?) impossible because we were always creating buffers though at the transport layer. Its on my list to review what we currently have as I still think that using a CachedOutputStream on the response is a little dodgy. We shouldn't need to create a file or buffer for the response, and we should be able to just write the headers on the first write(). Are there any cases where we're creating HTTP headers between the start of writing a response and the first flush()? I can't think of any. I think the big use case that was mentioned was that theoretically something could go wrong while we first start writing, and this would allow us to switch to writing a fault without any consequences. But if we're already writing, chances are the damage is done, and the fault is more low level - i.e. there is a problem with the stream. For normal requests we will want a BufferedOutputStream for performance reasons, but that is managed by Woodstox right now as it wraps the OutputStream when you create an XMLStreamWriter. But back to issue at hand ... I guess there are a few other situations
in which turning off chunking and buffering up the request body would be useful, for example if we anticipate a 401 Basic Auth challenge or 30x redirect may occur. So here's a variation on your buffering idea ... instead of imposing an arbitrary 100k limit, say we allow unlimited buffering (with content over-flowing to a local temp file if the payload exceeds some size reasonable to keep in memory), but *only* if we have a reasonable expectation that the server may be unable to handle chunked incoming requests. This expectation could either be configured, if the client developer knows upfront that the server-side stack is buggy in this respect (and wow, it really is a fundamental bug, sortta begs the question what possesses folks to use such a thing ...). If the server-side stack is unknown, then the client could be configured to probe it upfront with an innocuous HTTP GET specifying the chunked transfer-encoding, but with an entity-body composed of exactly one empty chunk. If we get back a 400 response, we infer the server-side is chunking-intolerant and buffer up the real outgoing POSTs. If on the other hand, we get a 200, then we fallback to chunking.
Are the redirect/authentication cases in particular HTTP server bugs or limitations of HTTP? It sounds like the later. I suppose we could keep a list of Servers that we should should default to non-chunked, but it sounds like that doesn't help the other cases. How about this counter-counter proposal :-) It seems we have a lot of cases which actually require non-chunked requests: - broken servers - authentication - redirects So why not turn off chunking by default and put in a log message which states something to the extent of: "HTTP chunking is turned off by default for compatability reasons. For possible performance improvements, try enabling chunking." For small requests (i.e. a couple K, and the most common), its likely to be the same performance as woodstox wraps the outputstream in a BufferedOutputStream. Is performance the only reason you want it turned on by default? Regards, - Dan -- Dan Diephouse Envoi Solutions http://envoisolutions.com | http://netzooid.com/blog
