Hi, On Thu, Feb 18, 2010 at 08:40:55PM -0800, Tai Lee wrote: > See also #7581, which has a patch that I have been using in my local > Django branch for over a year without issue. My use case is that I > need to generate large amounts of data in CSV format and stream the > response to avoid timeouts. I wouldn't mind `HttpResponse` consuming > `content` immediately if we had a `Http StreamingResponse` or similar > subclass, but the issue I found after discussion with Malcolm and Ivan > Sagalaev and ccahoon who worked on the http-wsgi-improvements branch > relates to middleware needing to be aware of when they should or > should not consume the content in a response, when they should replace > the content or generator in a response, or when they should be skipped > entirely.
Okay, you're ticket is already tracking the main problem here. I won't create a new one. Instead, I'll add patches/comments to that ticket (after discussion here). I didn't realize you had already done some work on this. Let's work together so to avoid duplicating effort. Here are the use cases that have been identified: 1. A content iterator was passed for no apparent reason. The view could just as easily have returned HttpResponse(''.join(iterator)) with no ill effects. I assume this use-case is what people have in mind when they talk about evaluating the iterator and storing the content in memory or on disk; doing this with large responses could cause a connection timeout. 2. Your use case: * The response content must be streamed. * The content length is unknown and won't be known until after the headers are sent. * An Etag couldn't be calculated because the content is not known until after the headers are sent. * The content stream should be gzip compressed. * Cache headers should be added. * Response content should not be cached locally. 3. My use case: * The response content must be streamed. * The content length is known and the Content-Length header can be set in the view. * The Etag is known and the Etag header can be set in the view. * The streamed content is already compressed, so it should not be compressed again. Further, compressing the stream would invalidate the previously set Content-Length and Etag headers. * Cache headers should be added. * Response content should not be cached locally. These are the fundamental ways in which handling of streaming and non-streaming responses must differ: =================== =============== ======================================= Feature Non-streaming Streaming ------------------- --------------- --------------------------------------- Set Content-Length Yes No ------------------- --------------- --------------------------------------- Set Etag Yes No ------------------- --------------- --------------------------------------- Gzip compress Yes Yes if it would not invalidate existing headers (e.g. Content-Length, Etag) ------------------- --------------- --------------------------------------- Set cache headers Yes Yes ------------------- --------------- --------------------------------------- Cache response Yes No, the response may be too big for locally local caching to make sense (maybe we should cache if we know the content-length ahead of time and it is not too large?). =================== =============== ======================================= Do the above behaviors sound right to everyone? Then there is the question of how to trigger handling as a streaming response. Either we require a view to explicitly request it by using a StreamingHttpResponse instead of a normal HttpResponse, or we assume that if the view returns an iterator, it wants a streaming response. Personally, I think that any view that returns an iterator must want a streaming response. Otherwise, why would it have returned an iterator? I can't think of any other reason to do that. However, I have no objection to requiring views use StreamingHttpResponse explicitly. Once we know that a response must be streamed, I think it makes sense to forbid middleware from breaking streaming by evaluating the iterator prematurely. I've advocated throwing an exception when that happens so that the middleware can be fixed to handle streaming responses correctly. Hopefully we can get some consensus on the above issues. If we can, I'm more than happy to whip up a patch implementing the behavior we can all agree on. > Here's my suggestion taken from a recent comment in #7581: > > I suggest that we simply add an attribute > `HttpResponse.disabled_middleware` (list or tuple). Any installed > middleware specified there should be bypassed when applying response > middleware. Django could then ship with an `HttpStreamingResponse` > class that disables any included middleware that is known not to work > with generator content, and this could easily be subclassed and > updated by users who need to disable any 3rd party middleware to > create a streaming response. Okay, I think "disabled_middleware" is a bad name because it doesn't actually imply that all middleware should be disabled. Anyway, it seems like middleware could just check if isinstance(response, StreamingHttpResponse). I look forward to your feedback. Thanks, Forest -- Forest Bond http://www.alittletooquiet.net http://www.pytagsfs.org
signature.asc
Description: Digital signature