Hi,

On Thu, Feb 18, 2010 at 08:40:55PM -0800, Tai Lee wrote:
> See also #7581, which has a patch that I have been using in my local
> Django branch for over a year without issue. My use case is that I
> need to generate large amounts of data in CSV format and stream the
> response to avoid timeouts. I wouldn't mind `HttpResponse` consuming
> `content` immediately if we had a `Http StreamingResponse` or similar
> subclass, but the issue I found after discussion with Malcolm and Ivan
> Sagalaev and ccahoon who worked on the http-wsgi-improvements branch
> relates to middleware needing to be aware of when they should or
> should not consume the content in a response, when they should replace
> the content or generator in a response, or when they should be skipped
> entirely.

Okay, you're ticket is already tracking the main problem here.  I won't create a
new one.  Instead, I'll add patches/comments to that ticket (after discussion
here).

I didn't realize you had already done some work on this.  Let's work together so
to avoid duplicating effort.

Here are the use cases that have been identified:

1. A content iterator was passed for no apparent reason.  The view could just
   as easily have returned HttpResponse(''.join(iterator)) with no ill effects.
   I assume this use-case is what people have in mind when they talk about
   evaluating the iterator and storing the content in memory or on disk; doing
   this with large responses could cause a connection timeout.

2. Your use case:

    * The response content must be streamed.
    * The content length is unknown and won't be known until after the headers
      are sent.
    * An Etag couldn't be calculated because the content is not known until
      after the headers are sent.
    * The content stream should be gzip compressed.
    * Cache headers should be added.
    * Response content should not be cached locally.

3. My use case:

    * The response content must be streamed.
    * The content length is known and the Content-Length header can be set in
      the view.
    * The Etag is known and the Etag header can be set in the view.
    * The streamed content is already compressed, so it should not be compressed
      again.  Further, compressing the stream would invalidate the previously
      set Content-Length and Etag headers.
    * Cache headers should be added.
    * Response content should not be cached locally.

These are the fundamental ways in which handling of streaming and non-streaming
responses must differ:

=================== =============== =======================================
Feature             Non-streaming   Streaming
------------------- --------------- ---------------------------------------
Set Content-Length  Yes             No
------------------- --------------- ---------------------------------------
Set Etag            Yes             No
------------------- --------------- ---------------------------------------
Gzip compress       Yes             Yes if it would not invalidate existing
                                    headers (e.g. Content-Length, Etag)
------------------- --------------- ---------------------------------------
Set cache headers   Yes             Yes
------------------- --------------- ---------------------------------------
Cache response      Yes             No, the response may be too big for
locally                             local caching to make sense (maybe we
                                    should cache if we know the
                                    content-length ahead of time and it is
                                    not too large?).
=================== =============== =======================================

Do the above behaviors sound right to everyone?

Then there is the question of how to trigger handling as a streaming response.
Either we require a view to explicitly request it by using a
StreamingHttpResponse instead of a normal HttpResponse, or we assume that if the
view returns an iterator, it wants a streaming response.

Personally, I think that any view that returns an iterator must want a streaming
response.  Otherwise, why would it have returned an iterator?  I can't think of
any other reason to do that.  However, I have no objection to requiring views
use StreamingHttpResponse explicitly.

Once we know that a response must be streamed, I think it makes sense to forbid
middleware from breaking streaming by evaluating the iterator prematurely.  I've
advocated throwing an exception when that happens so that the middleware can be
fixed to handle streaming responses correctly.

Hopefully we can get some consensus on the above issues.  If we can, I'm more
than happy to whip up a patch implementing the behavior we can all agree on.

> Here's my suggestion taken from a recent comment in #7581:
> 
> I suggest that we simply add an attribute
> `HttpResponse.disabled_middleware` (list or tuple). Any installed
> middleware specified there should be bypassed when applying response
> middleware. Django could then ship with an `HttpStreamingResponse`
> class that disables any included middleware that is known not to work
> with generator content, and this could easily be subclassed and
> updated by users who need to disable any 3rd party middleware to
> create a streaming response.

Okay, I think "disabled_middleware" is a bad name because it doesn't actually
imply that all middleware should be disabled.  Anyway, it seems like middleware
could just check if isinstance(response, StreamingHttpResponse).

I look forward to your feedback.

Thanks,
Forest
-- 
Forest Bond
http://www.alittletooquiet.net
http://www.pytagsfs.org

Attachment: signature.asc
Description: Digital signature

Reply via email to