Hi Ignace,

> > The getter methods return null if the path is empty (https://example.com),
> an empty array when the path
> > consists of a single slash (https://example.com/), and a non-empty
> array otherwise.
>

Yes, that's correct!


> Instead, I would rather always get a single type, the array as return
> value. The issue you are facing is that
> you want to convey via your return type if the path is absolute or not.
> But, we already have access to this
> information via the UriType Enum, at least in the case of the
> Uri\Rfc3986\Uri class.
>

The UriType enum in its current form is not really suitable, because it can
only distinguish relative and absolute
path references ("foo" vs "/foo"), but not absolute URIs ("
https://example.com"; vs "https://example.com/";).
"https://example.com"; and "https://example.com/"; are both absolute URIs,
and the former one has an empty path.

In order to find out the correct behavior, I think we should first try to
dig deeper into the definition of path segments.

Also, in order to have some inspiration, I checked how similar
functionality works in other languages, C# notably:
https://learn.microsoft.com/en-us/dotnet/api/system.uri.segments?view=net-10.0#system-uri-segments
Making the leading "/" its own segment feels a little bit off at the first
sight (not to mention that the "/" characters
are part of the segments), because RFC 3986 specifies that path segments
start after the leading "/" due to the
following ABNF rule:

path-abempty  = *( "/" segment )

That is, for URIs containing **an authority component**, the path is either
empty, or contains a "/" followed by a segment
one or multiple times. Then segments have the following syntax:

segment       = *pchar

That is, segments are composed of zero or multiple characters in the
"pchar" charset (the exact values don't matter
in this case). So let's see some basic examples with absolute URIs:

"https://example.com"; -> no path segments: []
"https://example.com/foo"; -> one path segment "foo": ["foo"]

Consequently:

"https://example.com/"; -> one path segment which is empty: [""]
"https://example.com/foo/"; -> two path segments: ["foo", ""]

Then the behavior of C# starts to make some sense - at least when the path
only consists of a "/" character (IMO it
doesn't make sense for other cases like  "/foo").

Now let's see what to do with path references:

"" (empty string) -> no path segments: []
"/foo" -> one path segment "foo": ["foo"]
"foo" -> one path segment "foo": ["foo"]
"foo/" -> two path segments: ["foo", ""]
"/" -> one path segment which is empty: [""]

Unfortunately, this is not all, there are a few other special cases for
absolute URIs:

"https://"; -> means that there's an authority, but it's empty, therefore
the path is also empty, therefore no path segments  -> []
"https:/" -> means that there's no authority, and the path is "/",
therefore one path segment which is empty  -> [""]
"https:" -> means that there's no authority, and the path is "", therefore
no path segments  -> []

As far as I can see, this behavior is completely logical and satisfies the
definitions of RFC 3986. However, one case
may possibly need disambiguation in relation to the withPathSegments()
method: "/foo" vs "foo". (P.S. the uriparser library
had to use a special field for tracking exactly these cases.)

That being said, I agree with you that the currently suggested signatures
should be changed. However, accepting an
additional UriType parameter by the withPathSegments() method wouldn't be
correct, because I've just demonstrated
that the behavior doesn't depend on whether an URI is absolute or relative,
but whether the authority component is
defined or not.

So my alternative idea for disambiguating the above mentioned case is the
following: adding a 2nd parameter
$addLeadingSlashForNonEmptyRelativeUri to the withPathSegments() method (I
know this param name is insanely long,
so I'm happy to get recommendations), and then a leading slash would be
added to the path if and only if all the 3
criteria are satisfied:

- the $addLeadingSlashForNonEmptyRelativeUri boolean parameter is true
- the first item in the $pathSegments array parameter is non-empty
- the target URI is relative

This means that calling $uri->withPathSegments(["", "foo"], false) and
$uri->withPathSegments(["foo"], true) would result
in the same path reference ("/foo") when $uri doesn't have an authority.
I'm fine with bikeshedding/fine-tuning these rules,
but I do think we should go with something along the lines of this.

For the Uri\WhatWg\Uri the information is less crucial as the validation
> and normalization rules of the WHATWG
>
specifications will autocorrect the path if needed.
>

Yes, true.

Máté

Reply via email to