On Fri, Sep 15, 2023, at 6:17 PM, Niels Dossche wrote:
> On 9/2/23 21:41, Niels Dossche wrote:
>> Hello internals
>>
>> I'm opening the discussion for my RFC "DOM HTML5 parsing and serialization
>> support".
>> https://wiki.php.net/rfc/domdocument_html5_parser
>>
>> Kind regards
>> Niels
>
>
> Hi internals
>
> I'd like to announce a change to the RFC. The new RFC version is 0.5.1,
> the old one was 0.4.0.
> The diff can be viewed via the revision history button on the right.
>
> I had a productive discussion with Tim and Arne about the class hierarchy.
> Here's a summary of the changes and the rationale.
>
> Until now, the RFC specified that DOM\HTML5Document extends DOMDocument.
> However, as we're introducing a new class anyway, we believe we should
> take the opportunity to improve the API.
> We have the following concerns:
> a) It's a bit of an awkward class hierarchy. *If* we hypothetically
> would want to get rid of DOMDocument in the far far future, we can't
> easily do that.
> b) API is messy. Some methods are useless for HTML5Document. E.g.:
> validate(), loadXML(), loadXMLFile(). They can be a source of confusion.
> c) The fact that you can pass HTML5Document to methods accepting
> DOMDocument may result in unexpected behaviour when the method expects
> a particular behaviour. It would be better if developers could "opt-in"
> to accepting both DOMDocument and HTML5Document in a method using a
> common base class.
> d) The properties set by DOMDocument's constructor are overridden by
> load methods, which is surprising. That's even mentioned as the second
> top comment on https://www.php.net/manual/en/domdocument.loadxml.php.
> Furthermore, the XML version argument of the constructor is even
> useless for HTML5 documents.
>
> So we propose the following changes to the RFC.
>
> We'll add a common abstract base class DOM\Document (name taken from
> the DOM spec & Javascript world).
> DOM\Document contains the properties and abstract methods common to
> both HTML and XML documents.
> Examples of what it includes/excludes:
> * includes: firstElementChild, lastElementChild, ...
> * excludes: xmlStandalone, xmlVersion, validate(), ...
> Then we'll have two subclasses: DOM\HTMLDocument (previously we called
> this DOM\HTML5Document) and DOM\XMLDocument. We dropped the 5 from the
> name to be more resilient to version changes and match the DOM spec
> name.
> DOMDocument will also use DOM\Document as a base class to make it
> interchangeable with the new classes.
>
> The above would solve points a, b, and c.
> To solve point d, we can use "factory methods":
> This means HTMLDocument's constructor will be made private, and instead
> we'll have three static methods that create a new instance:
> - HTMLDocument::fromHTMLString(string $xml): HTMLDocument;
That should be string $html, yes?
> - HTMLDocument::fromHTMLFile(string $filename): HTMLDocument;
> - HTMLDocument::fromEmptyDocument(string $encoding="UTF-8"):
> HTMLDocument;
>
>
> Or to put it in PHP code:
>
> ```
> namespace DOM {
> // The base abstract document class
> abstract class Document extends DOM\Node implements DOM\ParentNode {
> /* all properties and methods that are common and sensible for
> both
> XML & HTML documents */
> }
>
> class XMLDocument extends Document {
> /* insert specific XML methods and properties (e.g. xmlVersion,
> validate(), ...) here */
>
> private function __construct() {}
>
> public static function fromEmptyDocument(string $version =
> "1.0",
> string $encoding = "UTF-8");
> public static function fromFile(string $path);
> public static function fromString(string $source);
> }
>
> class HTMLDocument extends Document {
> /* insert specific Html methods and properties here */
>
> private function __construct() {}
>
> public static function fromEmptyDocument(string $encoding =
> "UTF-8");
> public static function fromFile(string $path);
> public static function fromString(string $source);
> }
> }
>
> class DOMDocument extends DOM\Document {
> /* Keep methods, properties, and constructor the same as they are now */
> }
> ```
>
> We're only adding XMLDocument for completeness and API parity. It's a
> drop-in replacement for DOMDocument, and behaves the exact same.
> The difference is that the API is on par with HTMLDocument, and the
> construction is designed to be more misuse-resistant.
> DOMDocument will NOT change, and remains for the foreseeable future.
>
> We also have to change the $ownerDocument field in DOMNode to have type
> ?DOM\Document instead of ?DOMDocument.
> Problem is that this breaks BC (but only a minor break):
> https://3v4l.org/El7Ve.
> Overriding properties is kind of useless, but if someone does it, then
> the compiler will complain loudly during compilation and it should be
> easy to fix.
>
>
> Of course, these changes means that the discussion period will run a
> bit longer than originally foreseen.
>
> Kind regards
> Niels
This all makes sense to me, and I like this as a way forward. Nice work!
--Larry Garfield
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php