Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Máté Kocsis Mon, 05 May 2025 14:33:48 -0700

Hi Paul,

I would not presume that the dedicated value objects are what "makes the
> [Rowbot] library much slower" than the RFC -- instead,


my first intuition is that the *parsing* operations are slower in userland
> than in C, and are primarily responsible for the comparative slowness.

Speedwise, creation of multiple objects from the parsed results would be a
> rounding error compared to the parsing itself.
>

Yes, I may have arrived at the wrong conclusion based on the right factors:
the Rowbot library uses objects for not just representing the components,
but even the parser states and other things, whereas in the C library,
parsing is just an enormous switch-case. I know that instantiating objects
doesn't
take a lot of time, but I guess the performance difference between a very
nicely written, full OO PHP code and an optimized C code will start to be
very much noticeable with a larger iteration number. Anyway, I shouldn't
have tried to compare the performance of the two solutions, since it's
really not
a fair comparison, and not the main point.

I think that's fair. The main thing that stands out to me is not the
> Scheme, Host, etc. value objects, but that the RFC presents no UrlRecord --

which is very definitely part the WHATWG-URL specification. That is, from
> reading the spec, I'd expect to see a UrlRecord, and a Url composed from it.
>

I believe the UrlRecord is a minor detail of the specification that is
possible to omit without sacrificing anything useful: having a record in
addition
to the URL class doesn't bring much to the table. For similar reasons, the
RFC doesn't implement the WHATWG getters either, and the pure
components are exposed instead (the "Component retrieval" section writes
about this). So the RFC does not entirely implement the API prescribed by
the
WHATWG URL spec, however it accurately follows the parsing details -- which
is the main benefit in my opinion.


> Meanwhile, AFAICT, neither Rowbot nor the RFC provide a percent *en*coding
> mechanism, for consumers to put together properly-encoded values.

Have I missed it in the RFC, or is it somehow not necessary, or something
> else?
>

Percent-encoding is usually automatically done for WHATWG (even if soft
errors may be triggered during the process), so it was not a top priority
for me just yet.
But I definitely want to include some sort of percent-encoding support in
the followup I plan. But in any case, thanks for raising awareness of this
topic.

Because it is part of the WHATWG-URL spec, I think it deserves first-class
> treatment in this RFC ...
>

Having yet another class in the proposal would open the possibility for a
whole lot of new discussion. We should draw the line somewhere in order not
to waste everyone's time, or the PHPFoundation's budget any longer, should
the RFC fail for any reason. And I just draw the line here, since it's a
nice to have
feature, and we have a meaningful set of functionality even without it.


> Which leads to my last point: I would really like to see at least two
> separate RFCs here. They be a lot easier to review and critique that way:
>
> - one for dealing with URIs as they exist now, especially one that the
> honors the ways-of-working that exist in userland; and,
> - one for dealing with WHATWG-URL in its entirety, with all its
> differences (some subtle, some not) from URIs.
>
> I can see arguments for either one being the "base" on which the other
> would build.
>

I may have agreed to pursue two separate RFCs a few months earlier, but not
anymore, around the very end. Although I should mention that the original
RFC tried to deal with WHATWG URLs only, RFC 3986 URIs were added later,
due to public demand. Possibly I should have stepped in around the time
when I included RFC 3986 support. However, I have to mention that working
on both specifications parallelly helped me understand a lot of the subtle
differences between the two specifications, and after bringing
these differences to the surface, the final API design could reflect and
tackle them.

Regards,
Máté

Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Reply via email to