Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Larry Garfield Fri, 28 Jun 2024 15:15:41 -0700

On Fri, Jun 28, 2024, at 8:06 PM, Máté Kocsis wrote:
> Hi Everyone,
>
> I've been working on a new RFC for a while now, and time has come to 
> present it to a wider audience.
>
> Last year, I learnt that PHP doesn't have built-in support for parsing 
> URLs according to any well established standards (RFC 1738 or the 
> WHATWG URL living standard), since the parse_url() function is 
> optimized for performance instead of correctness.
>
> In order to improve compatibility with external tools consuming URLs 
> (like browsers), my new RFC would add a WHATWG compliant URL parser 
> functionality to the standard library. The API itself is not final by 
> any means, the RFC only represents how I imagined it first.
>
> You can find the RFC at the following link: 
> https://wiki.php.net/rfc/url_parsing_api
>
> Regards,
> Máté


I am all for proper data modeling of all the things, so I support this effort.

Comments:

* There's no need for UrlComponent to be backed.

* I don't understand why UrlParser is a static class.  We just had a whole big 
debate about that. :-)

There's a couple of ways I could see it working, and I'm not sure which I 
prefer:

1. Better if we envision the parser getting options or configuration in the 
future.
$url = new UrlParser()->parseUrl(): Url;

2. The named-constructor pattern is quite common.
$url = Url::parseFromString()
$url = Url::parseToArray();

* I... do not understand the point of having public properties AND 
getters/withers.  A readonly class with withers, OK, a bit clunky to implement 
but it would be your problem in C, not mine, so I don't care. :-)  But why 
getters AND public properties?  If going that far, why not finish up clone-with 
and then we don't need the withers, either? :-)

* Making all the parameters to Url required except port makes little sense to 
me.  User/pass is more likely to be omitted 99% of the time than port.  In 
practice, most components are optional, in which case it would be inaccurate to 
not make them nullable.  Empty string wouldn't be quite the same, as that is 
still a value and code that knows to skip empty string when doing something is 
basically the same as code that knows to skip nulls.  We should assume people 
are going to instantiate this class themselves often, not just get it from the 
parser, so it should be designed to support that.

* I would not make Url final.  "OMG but then people can extend it!"  Exactly.  
I can absolutely see a case for an HttpUrl subclass that enforces scheme as 
http/https, or an FtpUrl that enforces a scheme of ftp, etc.  Or even an 
InternalUrl that assumes the host is one particular company, or something.  (If 
this sounds like scope creep, it's because I am confident that people will want 
to creep this direction and we should plan ahead for it.)

* If the intent of the withers is to mimic PSR-7, I don't think it does so 
effectively.  Without the interface, it couldn't be a drop-in replacement for 
UriInterface anyway.  And we cannot extend it to add the interface if it's 
final.  Widening the parameters in PSR-7 interfaces to support both wouldn't 
work, as that would be a hard-BC break for any existing implementations.  So I 
don't really see what the goal is here.

* If we ever get "data classes", this would be a good candidate. :-)

* Crazy idea:

new UriParser(HttpUrl::class)->parse(string);

To allow a more restrictive set of rules.  Or even just to cast the object to 
that child class.

--Larry Garfield

Re: [PHP-DEV] [RFC] [Discussion] Add WHATWG compliant URL parsing API

Reply via email to