On Mon, Dec 1, 2025 at 9:53 PM Máté Kocsis <[email protected]> wrote:

> Hi Everyone,
>
> I'd like to introduce my latest RFC that I've been working on for a while
> now: https://wiki.php.net/rfc/uri_followup.
>
> It proposes 5 followup improvements for ext/uri in the following areas:
> - URI Building
> - Query Parameter Manipulation
> - Accessing Path Segments as an Array
> - Host Type Detection
> - URI Type Detection
> - Percent-Encoding and Decoding Support
>
> I did my best to write an RFC that was at least as extensive as
> https://wiki.php.net/rfc/url_parsing_api had become by the end. Despite
> my efforts,
> there are still a couple things which need a final decision, or which
> need to be polished/improved. Some examples:
>
> - How to support array/object values for constructing query strings? (
> https://wiki.php.net/rfc/uri_followup#type_support)
> - How to make the UriQueryParams and UrlQueryParams classes more
> interoperable with the query string component (mainly with respect to
> percent-encoding)? (
> https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding)
> - Exactly how the advanced percent-decoding capabilities should work? Does
> it make sense to support all the possible modes (UriPercentEncodingMode)
> for percent-decoding as well (
> https://wiki.php.net/rfc/uri_followup#percent-encoding_and_decoding_support
> )
> - etc.
>
> Regards,
> Máté
>

Hi Màté,

After thinking about it here's my take on the current proposal
regarding the Query Parameter Manipulation RFC. Sorry for the wall of
text, but I tried to summarize my thoughts.

First of all, I tried to put myself in the shoes of a regular PHP
developer who has little to no knowledge about the different URI
specifications but has a general grasp of PHP. From that point of view
the developer knows that:

- PHP already gives access to the URI query parameters via the `_GET`
super globals
- to parse the query string in PHP, the developer can rely on `parse_str`.
- that to build a query string he should use the `http_build_query` function.

What we do know is that:

the `_GET` values are also the result of using `parse_str` and its logic is:

- not documented
- PHP centric
- mangles the data
- truncates query string

Its original goal was to allow direct conversion of query string into
PHP variables usable in scripts. But this behaviour has been removed
for security reasons from PHP.

`http_build_query` allow creating a query string in a more predictable
way but still exposes PHP centric behaviour:

- It uses `get_object_vars` on objects. which is counter-intuitive:

  - All `iterable` structures do not give the same result.
  - Depending on the object implementation the result varies between
PHP versions (ie `DateTimeImmutable` used to be rendered before PHP7.4
since then it fails silently resulting in an empty string being
generated.)

- It adds "[", "]" and indices around arrays. This is PHP centric
(other languages would just repeat the array name)
- It always adds the array indices even when the array is a list which
again can lead to unexpected behaviour, even within the PHP ecosystem.

On the other hand:

- Other modern languages like Java HttpServletRequest or the WHATWG
URLSearchParams have a complete different takes: They view the query
string as a collection of tuple (key/value pair) that can be repeated,
there is no notion of brackets. The data is preserved even though as
you mention the round-trip between encoding and decoding is never
guarantee.
- We have the new HTTP QUERY method which may or may not fall into the
"Should this also be managed by a putative Query class".

Currently, in your proposal you have 2 Query objects. This will give
the developper a lot of work to understand where, when and which
object to choose and why. Is that complexity really needed? IMHO we
may end up with a correct API ... that no-one will use.

With all that in mind I believe a single `Uri\Query` should be used.
Its goal should be:

- to be immutable
- to store the query in its decoded form.
- to manipulate and change the data in a consistent way.

Decoding/encoding should happen at the object boundaries but
everything inside the object should
be done on decoded data. Since no algorithm guarantee preserving
encoding during a decode/encode round-trip,
there is no need to try hard to do so.

This also means:

- having multiple string representations
- not having a `Uri::withQueryParams` or a `Url::withQueryParams` method.

It should be left to the developer to understand which string version he needs.

On a bonus side, it would be nice to have a mechanism in PHP that
allows the application to switch
from the current `parse_str` usage to the new improved parsing
provided by the new class when
populating the `_GET` array. (So that deprecating `parse_str` can be
initiated in some distant future.)
This last observation/remark is not mandatory but nice to have.

So I would propose the following methods:

```php

namespace Uri {
    //takes no arguments returns an empty object
    Query::__construct();

    // named constructor to allow
    // returning a new instance from
    // PHP variables (same syntax as http_build_query)
    Query::fromVariables(array $variable): static

    // named constructor to allow
    // returning a new instance from
    // a list of tuples see the returns
    // value of Query::toTuples()
    Query::fromTuples(array $params): static

    // named constructor to allow
    // returning a new instance from
    // query string this is where
    // decoding takes place

    Query::parseRfc1738String(): ?static
    Query::parseRfc3986String(): ?static
    Query::parseFormDataString(): ?static
    Query::parseWhatWgString(): ?static

    //String representation query
    //this is where encoding should happen
    //internal decoded data
    //should only be encoded here

    Query::toRfc3986String();
    Query::toRfc1738String();
    Query::toFormDataString();
    Query::toWhatWgString();

    // Tuple related methods
    // like the one defined by the WHATWG specifications
    // method names are changed or update to highlight
    // the immutable state for modifying methods

    Query::toTuples(): array<string, null|string|array<null|string>>
    Query::count(): int;
    Query::has(string $name): bool;
    Query::hasValue(string $name, null|string $value): bool;
    Query::getFirst(string $name): null|string;
    Query::getLast(string $name): null|string;
    Query::getAll(string $name): array<null|string>;

    // Tuple modifying methods

    Query::sort(): static;
    Query::withValue(string $name, null|string|array<null,string>
$value): static;
    Query::append(string $name, null|string|array<null,string> $value): static;
    Query::delete(string $name): static;
    Query::deleteValue(string $name, null|string $value): static;

    // PHP variables related methods
    // the parse_str replacement API

    Query::toVariables(): array;  // returns the same array as
parse_str (without mangled data)
    Query::countVariables(): int; // returns the number of variable found
    Query::hasVariable(string $variableName): bool; // tells whether
the variable exists
    Query::getVariable(string $variableName): null|string|array; //
returns the variable value
    Query::mergeVariable(array $variables): static // the same syntax
returned by the `Query::toVariables` method
    Query::replaceVariable(string $variableName,
null|string|int|float|array $value): static
    Query::deleteVariable(string $variableName): static
}
```

With the following changes:

- in respect to `parse_str`, no mangled data should occur on parsing:

```php
parse_str("foo.bar=baz", $params);
echo $params['foo_bar'];             // returns "baz"
array_key_exists('foo.bar', $params); // returns false

$query = \Uri\Query::parseRfc1738String("foo.bar=baz");
$query->getVariable("foo.bar"); //returns "baz"
$query->hasVariable("foo_bar"); //returns false
```

- in respect to `http_build_query`.

- Only accept scalar values, `null`, and `array`. If an object or a
resource is detected a `ValueError` error
should be thrown.

```php
echo http_build_query(['a' => tmpfile()]); //return '';
new \Uri\Query::fromVariables(['a' => tmpfile()]); // throw new ValueError
```

- Remove the addition of indices if the `array` is a list.

```php
echo http_build_query(['a' => [3, 5, 7]]); //return
a%5B0%5D=3&a%5B1%5D=5&a%5B2%5D=7;
new \Uri\Query::fromVariables(['a' => [3, 5, 7]])->toRfc1738String();
// return a%5B%5D=3&a%5B%5D=5&a%5B%5D=7
```

Best regards,

Ignace

Reply via email to