On 27/03/2025 22:04, Máté Kocsis wrote:
Hi Ignace,
While implementing the polyfill I am finding easier DX wise to
make the constructor private and use instead named constructors
for instantiation. I would be in favor of
`Uri::parse` and `Uri::tryParse` like it is done currently with
Enum and the `from` and `tryfrom` named constructors.
My reasoning is as follow:
there's no right way or wrong way to instantiate an URI there are
only contexts. While the parse method is all about parsing a
string, one could legitimately use other named constructors like
`Uri::fromComponents` which would take for instance the result of
parse_url to build a new URI. This can become handy in the case of
RFC3986 URI if you need to create an new URI not related to the
http scheme and that do not use all the components like the email,
data or FTP schemes.
By allowing creating URI based on their respective components
value you make it easier for dev to use the class. Also this means
that if we want to have a balance API then a `toComponents` method
should come hand in hand with the named constructor.
I would understand if that idea to add both components related
methods is rejected, they could be implemented on userland, but
the main point was to prove that from the VO or the developer POV
in absence of a clearly defined instantiation process, having a
traditional constructor fails to convey all the different way to
create an URI.
There are a few things which came to my mind:
- Currently, the underlying C libraries don't support a
`fromComponents` feature. How I could naively imagine this to work is
that the components are recomposed to a URI string based on the
relevant algorithm (for RFC 3986:
https://datatracker.ietf.org/doc/html/rfc3986#section-5.3), and then
this string is parsed and validated. Unfortunately, I recently
realized that this approach may leave room for some kind of parsing
confusion attack, namely when the scheme is for example "https", the
authority is empty, and the path is "example.com
<http://example.com>". This will result in a https://example.com URI.
I believe a similar bug is not possible with the rest of the
components because they have their delimiters. So possibly some other
solution will be needed, or maybe adding some additional validation (?).
- Nicolas raised my awareness that if URIs didn't have a proper
constructor, then one wouldn't be able to use URI objects as parameter
default values, like below:
function (Uri $foo = new Uri('blah'))
I think this omission would cause some usability regression. For this
reason, it may make sense to have a distinguished way of instantiating
an Uri.
- I have a similar feeling for a toComponents() method as for another
named constructor instead of __construct(): I am not completely
against it, but I'm not totally convinced about it.
Máté
Hi Máté,
for RFC 3986:
https://datatracker.ietf.org/doc/html/rfc3986#section-5.3), and then
this string is parsed and validated. Unfortunately, I recently
realized that this approach may leave room for some kind of parsing
confusion attack, namely when the scheme is for example "https", the
authority is empty, and the path is "example.com
<http://example.com>". This will result in a https://example.com
URI. I believe a similar bug is not possible with the rest of the
components because they have their delimiters. So possibly some
other solution will be needed, or maybe adding some additional
validation (?).
This is not correct according to RFC3986
https://datatracker.ietf.org/doc/html/rfc3986#section-3
*When authority is present, the path must either be empty or begin with
a slash ("/") character. When authority is not present, the path cannot
begin with two slash characters ("//"). *
So in your example it should throw an Uri\InvalidUriException 🙂 for RFC3986 and
in case of the WhatwgUrl algorithm it should trigger a soft error and correct
the behaviour for the http(s) schemes.
This is also one of the many reasons why at least for RFC3986 the path component can
never be `null` but that's another discussion. Like I said having a `fromComponenta`
named constructor would allow the "removal" of the need for a UriBuilder (in
your future section) and would IMHO be useful outside of the context of the http(s)
scheme but I can understand it being left out of the current implementation it might be
brought back for future improvements.
I have one last question regarding the URI implementations which are raised by
my polyfill:
Did you also took into account the delimiters when submitting data via the
withers ? In other words is
```php
$uri->withQuery('?foo=bar');
//the same as
$uri->withQuery('foo=bar');
```
I know it is the case in of the WHATWG specification but I do not know if you kept this
behaviour in your implementation for the WhatWgUrl for the Rfc3986 or for both. I would
lean toward not accepting this "normalization" but since this is not documented
in the RFC I wanted to know what is the expected behaviour.
Thanks for the hard work