Hi Ignace,
> > The getter methods return null if the path is empty (https://example.com), > an empty array when the path > > consists of a single slash (https://example.com/), and a non-empty > array otherwise. > Yes, that's correct! > Instead, I would rather always get a single type, the array as return > value. The issue you are facing is that > you want to convey via your return type if the path is absolute or not. > But, we already have access to this > information via the UriType Enum, at least in the case of the > Uri\Rfc3986\Uri class. > The UriType enum in its current form is not really suitable, because it can only distinguish relative and absolute path references ("foo" vs "/foo"), but not absolute URIs (" https://example.com" vs "https://example.com/"). "https://example.com" and "https://example.com/" are both absolute URIs, and the former one has an empty path. In order to find out the correct behavior, I think we should first try to dig deeper into the definition of path segments. Also, in order to have some inspiration, I checked how similar functionality works in other languages, C# notably: https://learn.microsoft.com/en-us/dotnet/api/system.uri.segments?view=net-10.0#system-uri-segments Making the leading "/" its own segment feels a little bit off at the first sight (not to mention that the "/" characters are part of the segments), because RFC 3986 specifies that path segments start after the leading "/" due to the following ABNF rule: path-abempty = *( "/" segment ) That is, for URIs containing **an authority component**, the path is either empty, or contains a "/" followed by a segment one or multiple times. Then segments have the following syntax: segment = *pchar That is, segments are composed of zero or multiple characters in the "pchar" charset (the exact values don't matter in this case). So let's see some basic examples with absolute URIs: "https://example.com" -> no path segments: [] "https://example.com/foo" -> one path segment "foo": ["foo"] Consequently: "https://example.com/" -> one path segment which is empty: [""] "https://example.com/foo/" -> two path segments: ["foo", ""] Then the behavior of C# starts to make some sense - at least when the path only consists of a "/" character (IMO it doesn't make sense for other cases like "/foo"). Now let's see what to do with path references: "" (empty string) -> no path segments: [] "/foo" -> one path segment "foo": ["foo"] "foo" -> one path segment "foo": ["foo"] "foo/" -> two path segments: ["foo", ""] "/" -> one path segment which is empty: [""] Unfortunately, this is not all, there are a few other special cases for absolute URIs: "https://" -> means that there's an authority, but it's empty, therefore the path is also empty, therefore no path segments -> [] "https:/" -> means that there's no authority, and the path is "/", therefore one path segment which is empty -> [""] "https:" -> means that there's no authority, and the path is "", therefore no path segments -> [] As far as I can see, this behavior is completely logical and satisfies the definitions of RFC 3986. However, one case may possibly need disambiguation in relation to the withPathSegments() method: "/foo" vs "foo". (P.S. the uriparser library had to use a special field for tracking exactly these cases.) That being said, I agree with you that the currently suggested signatures should be changed. However, accepting an additional UriType parameter by the withPathSegments() method wouldn't be correct, because I've just demonstrated that the behavior doesn't depend on whether an URI is absolute or relative, but whether the authority component is defined or not. So my alternative idea for disambiguating the above mentioned case is the following: adding a 2nd parameter $addLeadingSlashForNonEmptyRelativeUri to the withPathSegments() method (I know this param name is insanely long, so I'm happy to get recommendations), and then a leading slash would be added to the path if and only if all the 3 criteria are satisfied: - the $addLeadingSlashForNonEmptyRelativeUri boolean parameter is true - the first item in the $pathSegments array parameter is non-empty - the target URI is relative This means that calling $uri->withPathSegments(["", "foo"], false) and $uri->withPathSegments(["foo"], true) would result in the same path reference ("/foo") when $uri doesn't have an authority. I'm fine with bikeshedding/fine-tuning these rules, but I do think we should go with something along the lines of this. For the Uri\WhatWg\Uri the information is less crucial as the validation > and normalization rules of the WHATWG > specifications will autocorrect the path if needed. > Yes, true. Máté
