Silly question. Where is the proof that using a limited stream is a performance issue? These sorts of things are never going to be the bottleneck and sounds a bit like premature optimisation to think that wrapping it with a length limiting stream is going to be an issue.
Graham On Thursday, April 7, 2011 6:56:35 PM UTC+10, Tom Christie wrote: > > > So, OP should not be trying to read more than CONTENT_LENGTH. > > From the underlying stream, sure. The question is if it's okay to do on > the HttpRequest object. It's an issue because now that HttpRequest exposes > a file-like interface to the stream some users are likely to do things like > this: > > request_data = json.load(request) > > And expect that everything should work fine. > > > Django should provide a limited stream to prevent user reading more > > than CONTENT_LENGTH either by returning an empty end sentinel, > > or perhaps if wanted to be pedantic, raise an error. > > And it does currently provide a limited stream, in some cases, but for > (very reasonable) performance reasons, only when it appears neccessary. > > > Now they just seem to be duplicates > > and it's a good idea to ditch LimitBytes (since LimitedStream implements > > a readline() too) and move it into some common place. http.utils seems a > > good fit for it. > > Moving LimitedStream to somewhere like http.utils sounds like a good plan > to me. > > > Thoughts? > > I think HttpRequest definatly needs to ensure that it's safe to treat it as > any other file-like object, now that it exposes the .read()/.readlines() > methods, and at the moment it sounds like the behaviour of > .read(BUFFER_SIZE) past the end of the input is not well defined. > > 1. It might be reasonable to defer the creation of HttpRequest._stream, but > to _always_ wrap it in LimitedStream if it is created. (IE > HttpRequest._stream is a property that's only initialised when it's > accessed) That'd presumably be a performance gain for any code path that > _doesn't_ access .POST/.raw_post_data/.read, and a performance hit for > anything that _does_. In 99% of cases you'll be doing a .read() operation > without specifying any length at all so the performance hit will be the > initial creation of the LimitedStream object, but you won't actually have > subsequent function calls that incur the extra layer of wrapping. > > 2. Ensure that the .read()/.readline() interfaces will always expose a > LimitedStream, but avoid creating a limited stream for .POST and > .raw_post_data by reading directly from the underlying stream and enforcing > the CONTENT_LENGTH behavior explicitly in those cases. > > 3. Alter the LimitedStream wrapping behaviour to be more cautious. The > current behaviour is to wrap it in LimitedStream if it's known to be > necessary. It could instead wrap it in LimitedStream _unless_ it's known > to be _uneccessary_. > > Do any of those sound like reasonable options? > > I guess the other possibility is: > > 4. Everything is fine as it is, and in real world cases WSGI servers are > passing a limited stream on already. > > It basically sounds like that's the case, but it's not obvious how much we > can rely on that. > > It might be a case that I need to file a seperate bug for this: > > HttpRequest.read(BUFFER_SIZE) behaviour is unclear. > > and place a note on the other two that their resolution is dependant on > this, does that make sense to y'all? > > Really appreciate your input on this, > > Tom > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.