Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Máté Kocsis Tue, 25 Mar 2025 01:56:42 -0700

Hi Paul,

## Rowbot
>
> (None of the classes are readonly or final; these look to hew closely to
> the WHATWG-URL spec.)
>
> A BasicURLParser class:
>
> - affords relative parsing capability and an option parameter for the
> target URLRecord
> - returns a URLRecord
>
> A URLRecord class:
>
> - public mutable properties for the URL components
> - $scheme is a Scheme implementation with equals() and other is...()
> methods
> - $host is a HostInterface (and implementations) with equals() and other
> is...() methods
> - $path is a PathInterface (and PathList implementation) with PathSegment
> manipulation methods
> - setUsername() and setPassword() mutators
> - serializing
> - getOrigin(), includesCredentials(), isEqual()
>
> A URL class:
>
> - Composed of a URLRecord and a URLSearchParams object
> - Constructor takes a string, parses it to a URLRecord, and retains the
> URLRecord
> - a static parse() method with relative parsing, as a convenience method
> - __toString() and toString() return the serialized URLRecord
> - Virtual properties for $href, $origin, $protocol, $username, $password,
> $host, $hostname, $port, $pathname, $search, $searchParams, $hash
> - Mutability of virtual properties via magic __set()
> - Readability of virtual properties via magic __get()
>


I like some of the solutions this library uses - the usage of dedicated
value objects for some components (Scheme, HostInterface, PathInterface) -,
but
these features are what make the implementation extremely slow compared to
the implementation the RFC proposes. I didn't dig into the details when
I performed a very quick benchmark last week, so I can only assume that the
excessive usage of objects makes the library much slower than what's
possible
even for a userland library (obviously, an internal C implementation will
always be faster). According to my results, the RFC's implementation was
**two orders of magnitude** faster than the Rowbot library for parsing a
very basic "https://example.com"; URL 1000 times (~0.002 sec vs ~0.56 sec).

What I want to say with this is that it's perfectly fine to optimize a
userland library for ergonomics and for the usage of advanced OOP in mind,
but an internal
implementation should also keep efficiency in mind besides developer
experience. That's why I don't see myself implement separate objects for
some of
the components for now. But nothing would block us from doing it later, if
we found out it's necessary.

I believe the most fundamental difference between the Rowbot library and
the RFC is that the RFC has native support for percent-decoding (because
most properties are accessible in 2 variants), while the library completely
leaves this task for the user. Apart from that, the mutable design of the
library
is fragile for the same reason as the DateTime class is not safe to use in
most cases, so that's definitely a no-go for me.

This RFC is a synthesis of almost a year of discussion and refinement,
collaborated by some very clever folks, who have a lot of hands-on
experience of
URL parsing and handling. That's why I would say that input from Trevor
Rowbotham is also welcome in the discussion (especially his experience of
some
edge cases he had to deal with), but the said library is nowhere near as
widely adopted for it to qualify as something we must definitely take into
consideration
when designing PHP's new URL parsing API.


> A URLSearchParams class:
>
> - search params manipulation methods
> - implements Countable, Iterator, Stringable
> - composed of a QueryList implementation and (optionally) the originating
> URLRecord
>
>
I like this concept too. And in fact, support for such a class is on my
to-do list, and is mentioned in the "Future Scope". I just didn't want to
make the RFC
even longer, because we already have a lot of details to discuss.

Máté

Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Reply via email to