--On Tuesday, October 1, 2002 3:26 PM -0700 Greg Stein <[EMAIL PROTECTED]> wrote:
> So you're saying that if I do "file upload" to a PHP script, and upload a > 10 megabyte file, then it is going to spool that whole mother into memory? Yup. > Oh oh... even better. Let's just say that the PHP script isn't even > *thinking* about handling a request body. Maybe it is only set up for a > GET request. Or maybe it *is* set up for POST, but only for FORM > contents. But Mr Attacker comes along and throws 1 gigabyte into the > request. What then? DoS? Swap hell on the server? The PHP input filter will always read the body and allocate the space - irrespective of what the real script desires. In fact, looking at the code, I believe PHP will only free the memory if the script reads the body (do all scripts read the entire body?). So, a GET with a body (perfectly valid) may introduce memory leakage. PHP uses malloc/realloc/free because it wants the body in one contiguous chunk - therefore, our pools don't help. > I think a filter *can* read the request body (i.e. the content generator > loads a PHP script, PHP runs it (as the first filter), reads the body, and > loads content from a database). But that implies that the request body > should not have been thrown out in the default handler. Correct. At one point, I submitted a patch to the PHP lists to do exactly that, but once we rearranged how we discard bodies, this method couldn't work. The problem we had was when to 'discard' the body - we originally had it discarding at the end, but in order to properly handle 413s, we have to discard the body before generating the response. That's fairly new behavior on our part, but one I think that brings us in line with the desires of the RFC. Otherwise, we could have a 200 and then find out that it really should have been a 413 (because the body is too large). Therefore, we have to process the body before generating any content. And, since we now allow chunked encoding almost everywhere, we do have to read the entire body to know if it exceeds our limit. 1.3 chickened out on this and forbade chunked-encoding on request bodies. > But it almost seems cleaner to say there is a series of stages which > perform the request handling: process the body and generate the (initial) > content. These stages could load a script from somewhere, run it, > (repeat) and generate the content into the filter stack. > > Right now, we are confusing a *script* with *content*. I think the problem is that we aren't doing a good job of getting the script the content it (may) need. While it could be interesting to try to separate reading and writing in Apache, certainly the PHP language doesn't support that (as I believe you can write and then read the body). So, I'm not sure that we can split it out into multiple phases in an effective manner. Reading and writing in PHP (or any CGI script) is just too intertwined to support this. I think we're sorta stuck, but I might be wrong. -- justin