On Mon, Dec 1, 2025 at 9:53 PM Máté Kocsis <[email protected]> wrote:
> Hi Everyone, > > I'd like to introduce my latest RFC that I've been working on for a while > now: https://wiki.php.net/rfc/uri_followup. > > It proposes 5 followup improvements for ext/uri in the following areas: > - URI Building > - Query Parameter Manipulation > - Accessing Path Segments as an Array > - Host Type Detection > - URI Type Detection > - Percent-Encoding and Decoding Support > > I did my best to write an RFC that was at least as extensive as > https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite > my efforts, > there are still a couple things which need a final decision, or which > need to be polished/improved. Some examples: > > - How to support array/object values for constructing query strings? ( > https://wiki.php.net/rfc/uri_followup#type_support) > - How to make the UriQueryParams and UrlQueryParams classes more > interoperable with the query string component (mainly with respect to > percent-encoding)? ( > https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding) > - Exactly how the advanced percent-decoding capabilities should work? Does > it make sense to support all the possible modes (UriPercentEncodingMode) > for percent-decoding as well ( > https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support > ) > - etc. > > Regards, > Máté > Hi Màté, After thinking about it here's my take on the current proposal regarding the Query Parameter Manipulation RFC. Sorry for the wall of text, but I tried to summarize my thoughts. First of all, I tried to put myself in the shoes of a regular PHP developer who has little to no knowledge about the different URI specifications but has a general grasp of PHP. From that point of view the developer knows that: - PHP already gives access to the URI query parameters via the `_GET` super globals - to parse the query string in PHP, the developer can rely on `parse_str`. - that to build a query string he should use the `http_build_query` function. What we do know is that: the `_GET` values are also the result of using `parse_str` and its logic is: - not documented - PHP centric - mangles the data - truncates query string Its original goal was to allow direct conversion of query string into PHP variables usable in scripts. But this behaviour has been removed for security reasons from PHP. `http_build_query` allow creating a query string in a more predictable way but still exposes PHP centric behaviour: - It uses `get_object_vars` on objects. which is counter-intuitive: - All `iterable` structures do not give the same result. - Depending on the object implementation the result varies between PHP versions (ie `DateTimeImmutable` used to be rendered before PHP7.4 since then it fails silently resulting in an empty string being generated.) - It adds "[", "]" and indices around arrays. This is PHP centric (other languages would just repeat the array name) - It always adds the array indices even when the array is a list which again can lead to unexpected behaviour, even within the PHP ecosystem. On the other hand: - Other modern languages like Java HttpServletRequest or the WHATWG URLSearchParams have a complete different takes: They view the query string as a collection of tuple (key/value pair) that can be repeated, there is no notion of brackets. The data is preserved even though as you mention the round-trip between encoding and decoding is never guarantee. - We have the new HTTP QUERY method which may or may not fall into the "Should this also be managed by a putative Query class". Currently, in your proposal you have 2 Query objects. This will give the developper a lot of work to understand where, when and which object to choose and why. Is that complexity really needed? IMHO we may end up with a correct API ... that no-one will use. With all that in mind I believe a single `Uri\Query` should be used. Its goal should be: - to be immutable - to store the query in its decoded form. - to manipulate and change the data in a consistent way. Decoding/encoding should happen at the object boundaries but everything inside the object should be done on decoded data. Since no algorithm guarantee preserving encoding during a decode/encode round-trip, there is no need to try hard to do so. This also means: - having multiple string representations - not having a `Uri::withQueryParams` or a `Url::withQueryParams` method. It should be left to the developer to understand which string version he needs. On a bonus side, it would be nice to have a mechanism in PHP that allows the application to switch from the current `parse_str` usage to the new improved parsing provided by the new class when populating the `_GET` array. (So that deprecating `parse_str` can be initiated in some distant future.) This last observation/remark is not mandatory but nice to have. So I would propose the following methods: ```php namespace Uri { //takes no arguments returns an empty object Query::__construct(); // named constructor to allow // returning a new instance from // PHP variables (same syntax as http_build_query) Query::fromVariables(array $variable): static // named constructor to allow // returning a new instance from // a list of tuples see the returns // value of Query::toTuples() Query::fromTuples(array $params): static // named constructor to allow // returning a new instance from // query string this is where // decoding takes place Query::parseRfc1738String(): ?static Query::parseRfc3986String(): ?static Query::parseFormDataString(): ?static Query::parseWhatWgString(): ?static //String representation query //this is where encoding should happen //internal decoded data //should only be encoded here Query::toRfc3986String(); Query::toRfc1738String(); Query::toFormDataString(); Query::toWhatWgString(); // Tuple related methods // like the one defined by the WHATWG specifications // method names are changed or update to highlight // the immutable state for modifying methods Query::toTuples(): array<string, null|string|array<null|string>> Query::count(): int; Query::has(string $name): bool; Query::hasValue(string $name, null|string $value): bool; Query::getFirst(string $name): null|string; Query::getLast(string $name): null|string; Query::getAll(string $name): array<null|string>; // Tuple modifying methods Query::sort(): static; Query::withValue(string $name, null|string|array<null,string> $value): static; Query::append(string $name, null|string|array<null,string> $value): static; Query::delete(string $name): static; Query::deleteValue(string $name, null|string $value): static; // PHP variables related methods // the parse_str replacement API Query::toVariables(): array; // returns the same array as parse_str (without mangled data) Query::countVariables(): int; // returns the number of variable found Query::hasVariable(string $variableName): bool; // tells whether the variable exists Query::getVariable(string $variableName): null|string|array; // returns the variable value Query::mergeVariable(array $variables): static // the same syntax returned by the `Query::toVariables` method Query::replaceVariable(string $variableName, null|string|int|float|array $value): static Query::deleteVariable(string $variableName): static } ``` With the following changes: - in respect to `parse_str`, no mangled data should occur on parsing: ```php parse_str("foo.bar=baz", $params); echo $params['foo_bar']; // returns "baz" array_key_exists('foo.bar', $params); // returns false $query = \Uri\Query::parseRfc1738String("foo.bar=baz"); $query->getVariable("foo.bar"); //returns "baz" $query->hasVariable("foo_bar"); //returns false ``` - in respect to `http_build_query`. - Only accept scalar values, `null`, and `array`. If an object or a resource is detected a `ValueError` error should be thrown. ```php echo http_build_query(['a' => tmpfile()]); //return ''; new \Uri\Query::fromVariables(['a' => tmpfile()]); // throw new ValueError ``` - Remove the addition of indices if the `array` is a list. ```php echo http_build_query(['a' => [3, 5, 7]]); //return a%5B0%5D=3&a%5B1%5D=5&a%5B2%5D=7; new \Uri\Query::fromVariables(['a' => [3, 5, 7]])->toRfc1738String(); // return a%5B%5D=3&a%5B%5D=5&a%5B%5D=7 ``` Best regards, Ignace
