This is one I've already decided its best to do on the Apache side (im  
using mod_proxy to talk to paster serve), but here's the issue on the  
Python side.   I'm not sure if theres anything that can be done in  
Paste here since it seems to be the behavior of "socket" actually  
causing the ultimate issue; so feel free to ignore.

The initial problem is that of a malicious user who posts a file to  
the site of enormous size, which has the effect of keeping a long  
running connection open, and the potential to max out the filesystem  
when the upload is stored as a tempfile, or if the application needs  
to inspect the stream fully (such as for image processing) would cause  
the app to run out of memory.

The recipe at 
http://wiki.pylonshq.com/display/pylonscookbook/Hacking+Pylons+for+handling+large+file+upload
 
  solves most of this problem, by setting cgi.maxlen to a certain  
value so that a stream which is too large is rejected.  But also, it  
provides a modified version of Paste's Cascade with a special one that  
does not write wsgi.input to a tempfile, so that the data isn't  
written to the filesystem, either.   Cascade does not provide any  
option for disabling this directly.  Which would be nice if it did.

However in my testing, I've observed that after the DirectCascade  
class sends control to the next application, which then correctly  
throws an exception due to the cgi.maxlen limit, the connection to the  
client stays open forever (which is one of the things we're trying to  
defeat here).   Normally, when Cascade is in use, the entire socket  
contents have been read to a tempfile, the client connection shuts  
down and the error gets reported on the client side.   It seems that  
fully reading the buffer is the only way to get the connection to  
really close; otherwise, the browser hangs open, and only reports  
failure when the server is killed (which is why I think its still  
"connected".  If my understanding of TCP sockets, which is pretty much  
at the "series of tubes" level, is collapsing here, feel free to break  
out the cluestick).

So if I modify the last line of DirectCascade to catch all exceptions,  
do a full read() of wsgi.input and then re-raise, the connection shuts  
down afterwards, and the whole thing acts pretty much like Cascade  
except it didn't write the content to the filesystem.  This is almost  
a full solution (though a pretty hacky one).   But I really dont want  
to read the buffer at all, and this is the point at which I can't get  
it to work any better.     Calling close() on the actual socket (which  
is a few levels down in this case) doesn't seem to really shut  
everything down (nor does socket.shutdown()).   To do this I had to  
navigate through the paste.httpserver.LimitedLengthFile object, which  
has a few annoying things one of which that it overrides __repr__() to  
pretend its not really there (I dont really see the purpose of that),  
and it also doesn't implement close() (which doesn't work anyway).    
When I access environ['wsgi.input'].file.close(), or even  
environ['wsgi.input'].file._sock.shutdown(socket.SHUT_RDWR), the  
operation succeeds but the browser stays "connected" (using our  
previous definition of "connected", i.e. "keeps acting like its  
sending and seems to know when the server is killed").

My solution therefore is to just do this on the Apache side using  
LimitRequestBody, its just a single directive which, assuming Apache  
shuts down the connection gracefully, would solve the whole problem  
nicely.



_______________________________________________
Paste-users mailing list
[email protected]
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users

Reply via email to