Re: HttpRequest.read() issues

Graham Dumpleton Thu, 07 Apr 2011 03:07:55 -0700

Silly question. Where is the proof that using a limited stream is a 
performance issue? These sorts of things are never going to be the 
bottleneck and sounds a bit like premature optimisation to think that 
wrapping it with a length limiting stream is going to be an issue.


Graham

On Thursday, April 7, 2011 6:56:35 PM UTC+10, Tom Christie wrote:
>
> > So, OP should not be trying to read more than CONTENT_LENGTH.
>
> From the underlying stream, sure.  The question is if it's okay to do on 
> the HttpRequest object.  It's an issue because now that HttpRequest exposes 
> a file-like interface to the stream some users are likely to do things like 
> this:
>
> request_data = json.load(request)
>
> And expect that everything should work fine.
>
> > Django should provide a limited stream to prevent user reading more
> > than CONTENT_LENGTH either by returning an empty end sentinel,
> > or perhaps if wanted to be pedantic, raise an error.
>
> And it does currently provide a limited stream, in some cases, but for 
> (very reasonable) performance reasons, only when it appears neccessary.
>
> > Now they just seem to be duplicates 
> > and it's a good idea to ditch LimitBytes (since LimitedStream implements 
> > a readline() too) and move it into some common place. http.utils seems a 
> > good fit for it.
>
> Moving LimitedStream to somewhere like http.utils sounds like a good plan 
> to me.
>
> > Thoughts?
>
> I think HttpRequest definatly needs to ensure that it's safe to treat it as 
> any other file-like object, now that it exposes the .read()/.readlines() 
> methods, and at the moment it sounds like the behaviour of 
> .read(BUFFER_SIZE) past the end of the input is not well defined.
>
> 1. It might be reasonable to defer the creation of HttpRequest._stream, but 
> to _always_ wrap it in LimitedStream if it is created.  (IE 
> HttpRequest._stream is a property that's only initialised when it's 
> accessed)  That'd presumably be a performance gain for any code path that 
> _doesn't_ access .POST/.raw_post_data/.read, and a performance hit for 
> anything that _does_.  In 99% of cases you'll be doing a .read() operation 
> without specifying any length at all so the performance hit will be the 
> initial creation of the LimitedStream object, but you won't actually have 
> subsequent function calls that incur the extra layer of wrapping.
>
> 2. Ensure that the .read()/.readline() interfaces will always expose a 
> LimitedStream, but avoid creating a limited stream for .POST and 
> .raw_post_data by reading directly from the underlying stream and enforcing 
> the CONTENT_LENGTH behavior explicitly in those cases.
>
> 3. Alter the LimitedStream wrapping behaviour to be more cautious.  The 
> current behaviour is to wrap it in LimitedStream if it's known to be 
> necessary.    It could instead wrap it in LimitedStream _unless_ it's known 
> to be _uneccessary_.
>
> Do any of those sound like reasonable options?
>
> I guess the other possibility is:
>
> 4. Everything is fine as it is, and in real world cases WSGI servers are 
> passing a limited stream on already.
>
> It basically sounds like that's the case, but it's not obvious how much we 
> can rely on that.
>
> It might be a case that I need to file a seperate bug for this:
>
> HttpRequest.read(BUFFER_SIZE) behaviour is unclear.
>
> and place a note on the other two that their resolution is dependant on 
> this, does that make sense to y'all?
>
> Really appreciate your input on this,
>
>   Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: HttpRequest.read() issues

Reply via email to