Hi Ben

On Tue, Jun 27, 2023 at 9:54 PM Ben Ramsey <b...@benramsey.com> wrote:
>
> > On Jun 27, 2023, at 04:01, Ilija Tovilo <tovilo.il...@gmail.com> wrote:
> >
> > Hi Ben, Hi Rowan
> >
> > On Mon, Jun 26, 2023 at 8:55 PM Ben Ramsey <b...@benramsey.com> wrote:
> >>
> >>> On Jun 20, 2023, at 06:06, Rowan Tommins <rowan.coll...@gmail.com> wrote:
> >>>
> >>> On Tue, 20 Jun 2023 at 10:25, Ilija Tovilo <tovilo.il...@gmail.com> wrote:
> >>>
> >>>> Introduce a new function (currently named populate_post_data()) to
> >>>> read the input stream and populate the $_POST and $_FILES
> >>>> superglobals.
>
> In the past, I’ve used something like the following to solve this:
>
>     parse_str(file_get_contents('php://input'), $data);
>
> I haven’t looked up how any of the frameworks solve this, but I would be 
> willing to bet they also do something similar.
>
> Rather than implementing functionality to populate globals, would you be 
> interested in introducing some new HTTP request functions. Something like:
>
>     http_request_body(): string
>     http_parse_query(string $queryString): array
>
> `http_request_body()` would return the raw body and would be the equivalent 
> of calling `file_get_contents('php://input')`. Of special note is that it 
> should _always_ return the raw body, even if `$_POST` is populated, for the 
> sake of consistency and reducing confusion.
>
> `http_parse_query()` would be the opposite of `http_build_query()` and would 
> return a value instead of requiring a reference parameter, like `parse_str()`.

The problem is that the content stream for multipart/form-data is
expected to be big, as in possibly multiple gigabytes big. We can't
use http_request_body() to return the entire content as a string at
once. The current RFC1867 implementation reads and operates in chunks,
i.e. appends it to a file or to a string, depending on the content
part. It never has to hold on to the entire content in memory.
http_request_body() also can't return the content of the request again
after it has been consumed, because that's not how the HTTP protocol
works. We would need to buffer the content somewhere when reading it
for the first time, which again we can't do because it may be very
big.

It may be possible to pass the fopen('php://input', 'r') stream to
this function and let it consume it. However, as mentioned in my
original e-mail this requires some changes to how RFC1867 requests are
handled. Currently, it calls sapi_module.read_post() which directly
reads from the TCP socket. Instead, we'd need to read from the stream,
possibly in addition so that the general case is not degraded in terms
of performance. I'll verify if this is an option, and whether the
changes are (too) big. However, I don't suspect there to be a lot of
use-cases for this as RFC1867 is primarily used for requests and not
for responses, so you wouldn't usually need to parse this type of
content from some other source.

As for returning the parsed values as non-globals, that's entirely
possible. However, it's inconsistent with how requests are currently
handled. The values will need to be passed around manually and kept
alive, but the function still modifies global state (i.e. the input
stream, whether that's sapi_module.read_post() or php://input). I
don't believe it will be common to call this function more than once
per request, and thus decoupling the state is not really necessary.

Ilija

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to