Re: Problems with Chunking

Dan Diephouse Tue, 20 Feb 2007 07:13:54 -0800

Hi Eoghan,

Comments inline...


On 2/20/07, Glynn, Eoghan <[EMAIL PROTECTED]> wrote:



> -----Original Message-----
> From: Dan Diephouse [mailto:[EMAIL PROTECTED]
> Sent: 19 February 2007 19:12
> To: [email protected]
> Subject: Problems with Chunking
>
> Hi All,
>
> I did some debugging over the weekend with a user and ASP.NET
> seems to have problems if chunking isn't on. Here is the
> response that comes when it is turned on:
>
> HTTP/1.1 400 Bad Request
> Server: ASP.NET <http://asp.net/> Development Server/8.0.0.0
> Date: Sat, 17 Feb 2007 07:55:29 GMT
> X-AspNet-Version: 2.0.50727
> Cache-Control: private
> Content-Length: 0
> Connection: Close
>
> It works fine however if chunking is turned off. There are
> other servers as well that don't work with chunking, which is
> why we ultimately turned off chunking.
>
> I want to suggest that either
>
> a) We turn off chunking by default.
> b) We have some threshold for chunking. For instance, first
> we stream up to 100K to a byte[] buffer. If there is still
> more to write, we write the buffer and the rest of the
> request as a chunked request. Otherwise it is written as a
> non-chunked request.

Well the problem with this approach is what happens if the request is
>100k and the server-side happens to be ASP.NET? Since we fallback to
chunking once the 100k threshold is reached, presumably the server-side
will barf and we're back where we started.

So I don't really like the idea of a band-aid that will work some of the
time, but allow the old problem to creep back in when there's an
unexpectedly large outgoing request.

Ironically, we had a long discussion on this list some time back, with a
lot of opposition expressed to the way the HTTP wrapper output stream
buffers up the request payload up to the first flush(), so as to allow
headers to be set by interceptors after the first payload write may have
occurred.



I don't think its so ironic. My objection was on the server side. If you
recall I want the ability to do writes without creating new buffers so we
can do efficient XML routing. This was (is?) impossible because we were
always creating buffers though at the transport layer. Its on my list to
review what we currently have as I still think that using a
CachedOutputStream on the response is a little dodgy. We shouldn't need to
create a file or buffer for the response, and we should be able to just
write the headers on the first write().  Are there any cases where we're
creating HTTP headers between the start of writing a response and the first
flush()? I can't think of any. I think the big use case that was mentioned
was that theoretically something could go wrong while we first start
writing, and this would allow us to switch to writing a fault without any
consequences. But if we're already writing, chances are the damage is done,
and the fault is more low level - i.e. there is a problem with the stream.

For normal requests we will want a BufferedOutputStream for performance
reasons, but that is managed by Woodstox right now as it wraps the
OutputStream when you create an XMLStreamWriter.

But back to issue at hand ... I guess there are a few other situations

in which turning off chunking and buffering up the request body would be
useful, for example if we anticipate a 401 Basic Auth challenge or 30x
redirect may occur.

So here's a variation on your buffering idea ... instead of imposing an
arbitrary 100k limit, say we allow unlimited buffering (with content
over-flowing to a local temp file if the payload exceeds some size
reasonable to keep in memory), but *only* if we have a reasonable
expectation that the server may be unable to handle chunked incoming
requests.

This expectation could either be configured, if the client developer
knows upfront that the server-side stack is buggy in this respect (and
wow, it really is a fundamental bug, sortta begs the question what
possesses folks to use such a thing ...).

If the server-side stack is unknown, then the client could be configured
to probe it upfront with an innocuous HTTP GET specifying the chunked
transfer-encoding, but with an entity-body composed of exactly one empty
chunk. If we get back a 400 response, we infer the server-side is
chunking-intolerant and buffer up the real outgoing POSTs. If on the
other hand, we get a 200, then we fallback to chunking.


Are the redirect/authentication cases in particular HTTP server bugs or
limitations of HTTP? It sounds like the later. I suppose we could keep a
list of Servers that we should should default to non-chunked, but it sounds
like that doesn't help the other cases.

How about this counter-counter proposal :-) It seems we have a lot of cases
which actually require non-chunked requests:
- broken servers
- authentication
- redirects

So why not turn off chunking by default and put in a log message which
states something to the extent of: "HTTP chunking is turned off by default
for compatability reasons. For possible performance improvements, try
enabling chunking."

For small requests (i.e. a couple K, and the most common), its likely to be
the same performance as woodstox wraps the outputstream in a
BufferedOutputStream.  Is performance the only reason you want it turned on
by default?

Regards,

- Dan

--
Dan Diephouse
Envoi Solutions
http://envoisolutions.com | http://netzooid.com/blog

Re: Problems with Chunking

Reply via email to