On Fri, Jun 28, 2024, at 8:06 PM, Máté Kocsis wrote: > Hi Everyone, > > I've been working on a new RFC for a while now, and time has come to > present it to a wider audience. > > Last year, I learnt that PHP doesn't have built-in support for parsing > URLs according to any well established standards (RFC 1738 or the > WHATWG URL living standard), since the parse_url() function is > optimized for performance instead of correctness. > > In order to improve compatibility with external tools consuming URLs > (like browsers), my new RFC would add a WHATWG compliant URL parser > functionality to the standard library. The API itself is not final by > any means, the RFC only represents how I imagined it first. > > You can find the RFC at the following link: > https://wiki.php.net/rfc/url_parsing_api > > Regards, > Máté
I am all for proper data modeling of all the things, so I support this effort. Comments: * There's no need for UrlComponent to be backed. * I don't understand why UrlParser is a static class. We just had a whole big debate about that. :-) There's a couple of ways I could see it working, and I'm not sure which I prefer: 1. Better if we envision the parser getting options or configuration in the future. $url = new UrlParser()->parseUrl(): Url; 2. The named-constructor pattern is quite common. $url = Url::parseFromString() $url = Url::parseToArray(); * I... do not understand the point of having public properties AND getters/withers. A readonly class with withers, OK, a bit clunky to implement but it would be your problem in C, not mine, so I don't care. :-) But why getters AND public properties? If going that far, why not finish up clone-with and then we don't need the withers, either? :-) * Making all the parameters to Url required except port makes little sense to me. User/pass is more likely to be omitted 99% of the time than port. In practice, most components are optional, in which case it would be inaccurate to not make them nullable. Empty string wouldn't be quite the same, as that is still a value and code that knows to skip empty string when doing something is basically the same as code that knows to skip nulls. We should assume people are going to instantiate this class themselves often, not just get it from the parser, so it should be designed to support that. * I would not make Url final. "OMG but then people can extend it!" Exactly. I can absolutely see a case for an HttpUrl subclass that enforces scheme as http/https, or an FtpUrl that enforces a scheme of ftp, etc. Or even an InternalUrl that assumes the host is one particular company, or something. (If this sounds like scope creep, it's because I am confident that people will want to creep this direction and we should plan ahead for it.) * If the intent of the withers is to mimic PSR-7, I don't think it does so effectively. Without the interface, it couldn't be a drop-in replacement for UriInterface anyway. And we cannot extend it to add the interface if it's final. Widening the parameters in PSR-7 interfaces to support both wouldn't work, as that would be a hard-BC break for any existing implementations. So I don't really see what the goal is here. * If we ever get "data classes", this would be a good candidate. :-) * Crazy idea: new UriParser(HttpUrl::class)->parse(string); To allow a more restrictive set of rules. Or even just to cast the object to that child class. --Larry Garfield