On 27/03/2025 22:04, Máté Kocsis wrote:
Hi Ignace,

    While implementing the polyfill I am finding easier DX wise to
    make the constructor private and use instead named constructors
    for instantiation. I would be in favor of

    `Uri::parse` and `Uri::tryParse` like it is done currently with
    Enum and the `from` and `tryfrom` named constructors.

    My reasoning is as follow:

     there's no right way or wrong way to instantiate an URI there are
    only contexts. While the parse method is all about parsing a
    string, one could legitimately use other named constructors like
    `Uri::fromComponents` which would take for instance the result of
    parse_url to build a new URI. This can become handy in the case of
    RFC3986 URI if you need to create an new URI not related to the
    http scheme and that do not use all the components like the email,
    data or FTP schemes.

     By allowing creating URI based on their respective components
    value you make it easier for dev to use the class. Also this means
    that if we want to have a balance API then a `toComponents` method
    should come hand in hand with the named constructor.

    I would understand if that idea to add both components related
    methods is rejected, they could be implemented on userland, but
    the main point was to prove that from the VO or the developer POV
    in absence of a clearly defined instantiation process, having a
    traditional constructor fails to convey all the different way to
    create an URI.

There are a few things which came to my mind:
- Currently, the underlying C libraries don't support a `fromComponents` feature. How I could naively imagine this to work is that the components are recomposed to a URI string based on the relevant algorithm (for RFC 3986: https://datatracker.ietf.org/doc/html/rfc3986#section-5.3), and then this string is parsed and validated. Unfortunately, I recently realized that this approach may leave room for some kind of parsing confusion attack, namely when the scheme is for example "https", the authority is empty, and the path is "example.com <http://example.com>". This will result in a https://example.com URI. I believe a similar bug is not possible with the rest of the components because they have their delimiters. So possibly some other solution will be needed, or maybe adding some additional validation (?).

- Nicolas raised my awareness that if URIs didn't have a proper constructor, then one wouldn't be able to use URI objects as parameter default values, like below:
function (Uri $foo = new Uri('blah'))
I think this omission would cause some usability regression. For this reason, it may make sense to have a distinguished way of instantiating an Uri.

- I have a similar feeling for a toComponents() method as for another named constructor instead of __construct(): I am not completely against it, but I'm not totally convinced about it.

Máté

Hi Máté,

   for RFC 3986:
   https://datatracker.ietf.org/doc/html/rfc3986#section-5.3), and then
   this string is parsed and validated. Unfortunately, I recently
   realized that this approach may leave room for some kind of parsing
   confusion attack, namely when the scheme is for example "https", the
   authority is empty, and the path is "example.com
   <http://example.com>". This will result in a https://example.com
   URI. I believe a similar bug is not possible with the rest of the
   components because they have their delimiters. So possibly some
   other solution will be needed, or maybe adding some additional
   validation (?).

This is not correct according to RFC3986 https://datatracker.ietf.org/doc/html/rfc3986#section-3


*When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//"). *

So in your example it should throw an Uri\InvalidUriException 🙂 for RFC3986 and 
in case of the WhatwgUrl algorithm it should trigger a soft error and correct 
the behaviour for the http(s) schemes.
This is also one of the many reasons why at least for RFC3986 the path component can 
never be `null` but that's another discussion. Like I said having a `fromComponenta` 
named constructor would allow the "removal" of the need for a UriBuilder (in 
your future section) and would IMHO be useful outside of the context of the http(s) 
scheme but I can understand it being left out of the current implementation it might be 
brought back for future improvements.

I have one last question regarding the URI implementations which are raised by 
my polyfill:

Did you also took into account the delimiters when submitting data via the 
withers ? In other words is

```php
$uri->withQuery('?foo=bar');
//the same as
$uri->withQuery('foo=bar');
```

I know it is the case in of the WHATWG specification but I do not know if you kept this 
behaviour in your implementation for the WhatWgUrl for the Rfc3986 or for both. I would 
lean toward not accepting this "normalization" but since this is not documented 
in the RFC I wanted to know what is the expected behaviour.

Thanks for the hard work


Reply via email to