Hi Paul,

> The Rfc3986\Uri `raw()` methods present a departure from existing
userland expectations when working with URIs. No existing URI package that
I'm aware of retains the normalized values as their "main" values; the
values are generally retained-as-given (i.e. "raw"). Nor do they afford
getting two versions of the retained values (one raw, one normalized).

As a maintainer of a userland URI package I disagree with this approach. I
believe offering both raw and normalized methods in a single class while
representing a new approach in  PHP also offers a better representation of
URIs in general. The current approach in userland mixes both raw and
half normalized components as well as RFC3986 and RFC3987 specification
with ambiguity around normalization, input, constructior, what needs to be
encoded where and when, something this proposal has been successful at
avoiding by using the raw and normalized methods.

> - fulfill existing userland expectations;

Existing userland expectations are mostly built around `parse_url` which is
one of the reasons the RFC exists to improve the status quo and to
introduce in PHP valid parsers against recognizable URI specifications. Yes
some adaptation will be needed to use them in userland but I believe this
work is easy to do, talking from the POV of a URI package maintainer.

> - replace the toString()/toRawString() with a single idiomatic
__toString() in each class;

For all the reasons explained in the RFC, adding a `__toString` method is a
bad architectural design for an URI. There are so many ways to represent an
URI that  having a `__toString` for string representation gives a false
sense of "there can be only one true representation for a single URI" which
is not true. URI can be normalized, raw, and have different representations
depending on the context in which it will be used. So again, I believe the
RFC made the right call to not implement the Stringable interface to force
the developer to make the right call or to encapsulate the value object
into a proper URI representational class or method that can use the exposed
raw and normalized representation of each component to produce the expected
URI representation.

> - move normalization logic into the NormalizedUri class.
The classes follow  specifications that describe how normalization should
be. Why would you split the responsibilities in other classes ? What would
be the added value ?

Again, I understand this is new code and current URI packages, mine
included, will have to adapt but on the longer run I believe the proposed
API is more predictive and easier to reason about. To quote someone
"Comfort and the fear of change are the greatest enemies of success."

Best regards,
Ignace Nyamagana Butera


On Mon, Apr 28, 2025 at 9:53 PM Paul M. Jones <pmjo...@pmjones.io> wrote:

> Hi Maté and all,
>
> > On Apr 27, 2025, at 16:47, Máté Kocsis <kocsismat...@gmail.com> wrote:
> >
> > Hi Tim,
> ...
> >> So it seems to be safer to use the naming without the `raw` and then in
> >> the documentation explain what happens with useful examples, just like
> >> the RFC already does.
> >
> > We discussed this off the list, and the recommendation made sense to me
> at last.
>
> I am glad to see it!
>
> * * *
>
> Removing the `raw()` methods from the Whatwg\Url class opens up another
> opportunity.
>
> The Rfc3986\Uri `raw()` methods present a departure from existing userland
> expectations when working with URIs. No existing URI package that I'm aware
> of retains the normalized values as their "main" values; the values are
> generally retained-as-given (i.e. "raw"). Nor do they afford getting two
> versions of the retained values (one raw, one normalized).
>
> This might be solved by renaming the Rfc3986\Uri methods so that the
> "main" methods return the raw values, and the alternative methods return
> the normalized versions. For example, getPath() would become
> getNormalizedPath(), and getRawPath() would become getPath().
>
> But that's pretty verbose, and on considering it further, I think I think
> there are two classes combined inside Rfc3986\Uri.
>
> Proposal:
>
> Instead of a single Rfc3986\Uri class that tries to hold *both* raw *and*
> normalized values and logic at the same time, introduce a NormalizedUri
> class to operate with normalized values, and treat the current Uri class as
> operating with raw values. That would, among other things:
>
> - fulfill existing userland expectations;
> - eliminate the getRaw() methods;
> - replace the toString()/toRawString() with a single idiomatic
> __toString() in each class;
> - move normalization logic into the NormalizedUri class.
>
> Optionally, there could be one additional method one or both classes,
> toNormalizedUri(), to create and return a normalized instance. For Uri the
> return would be a new NormalizedUri; for NormalizedUri, the return would
> either be itself ($this) or a clone of itself.
>
> If the RFC pursues that approach, it will also lend itself to either an
> abstract they each extend or (preferably) an interface they each implement.
> If an interface, I opine it should be called Uri; the current Uri class
> might become RawUri (with NormalizedUri not needing a rename).
>
> Thoughts?
>
>
> -- pmj
>

Reply via email to