This is one I've already decided its best to do on the Apache side (im using mod_proxy to talk to paster serve), but here's the issue on the Python side. I'm not sure if theres anything that can be done in Paste here since it seems to be the behavior of "socket" actually causing the ultimate issue; so feel free to ignore.
The initial problem is that of a malicious user who posts a file to the site of enormous size, which has the effect of keeping a long running connection open, and the potential to max out the filesystem when the upload is stored as a tempfile, or if the application needs to inspect the stream fully (such as for image processing) would cause the app to run out of memory. The recipe at http://wiki.pylonshq.com/display/pylonscookbook/Hacking+Pylons+for+handling+large+file+upload solves most of this problem, by setting cgi.maxlen to a certain value so that a stream which is too large is rejected. But also, it provides a modified version of Paste's Cascade with a special one that does not write wsgi.input to a tempfile, so that the data isn't written to the filesystem, either. Cascade does not provide any option for disabling this directly. Which would be nice if it did. However in my testing, I've observed that after the DirectCascade class sends control to the next application, which then correctly throws an exception due to the cgi.maxlen limit, the connection to the client stays open forever (which is one of the things we're trying to defeat here). Normally, when Cascade is in use, the entire socket contents have been read to a tempfile, the client connection shuts down and the error gets reported on the client side. It seems that fully reading the buffer is the only way to get the connection to really close; otherwise, the browser hangs open, and only reports failure when the server is killed (which is why I think its still "connected". If my understanding of TCP sockets, which is pretty much at the "series of tubes" level, is collapsing here, feel free to break out the cluestick). So if I modify the last line of DirectCascade to catch all exceptions, do a full read() of wsgi.input and then re-raise, the connection shuts down afterwards, and the whole thing acts pretty much like Cascade except it didn't write the content to the filesystem. This is almost a full solution (though a pretty hacky one). But I really dont want to read the buffer at all, and this is the point at which I can't get it to work any better. Calling close() on the actual socket (which is a few levels down in this case) doesn't seem to really shut everything down (nor does socket.shutdown()). To do this I had to navigate through the paste.httpserver.LimitedLengthFile object, which has a few annoying things one of which that it overrides __repr__() to pretend its not really there (I dont really see the purpose of that), and it also doesn't implement close() (which doesn't work anyway). When I access environ['wsgi.input'].file.close(), or even environ['wsgi.input'].file._sock.shutdown(socket.SHUT_RDWR), the operation succeeds but the browser stays "connected" (using our previous definition of "connected", i.e. "keeps acting like its sending and seems to know when the server is killed"). My solution therefore is to just do this on the Apache side using LimitRequestBody, its just a single directive which, assuming Apache shuts down the connection gracefully, would solve the whole problem nicely. _______________________________________________ Paste-users mailing list [email protected] http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users
