Hi Robert

On 30/12/2023 10:25, Robert Landers wrote:
> Hi Niels,
> 
>> They are indeed going to be very similar, but at least having better return 
>> types would be good to give one particular example.
>> e.g. we currently have a lot of methods that can return an object or false. 
>> The current living DOM spec always throws exceptions instead of returning 
>> false on error which is a much cleaner API.
>> Furthermore, we have the DOMNameSpaceNode that can be returned by some 
>> methods and has been a point of confusion for static analysis tools (I did a 
>> PR on psalm to fix one of those issues).
>> That node type won't be special cased in the new classes API so the 
>> (inconsistent use of the) union of DOMAttr|DOMNameSpaceNode will go away.
> 
> Actually, I'm not sure it is supposed to be throwing exceptions (if we
> look at https://html.spec.whatwg.org/multipage/parsing.html#parse-errors);
> in fact, I'd argue there are three different ways to handle errors
> (from some experience in writing a parser from scratch):

I'm not talking about handling parser errors.
Parser errors indeed should not be handled via exceptions, they emit a warning 
and continue with error recovery as described in spec.
This was part of my HTML 5 RFC: 
https://wiki.php.net/rfc/domdocument_html5_parser

I'm talking about methods like createElement, setAttributeNode, ... that can 
fail due to errors.
In DOM 3 (and therefore PHP too), there was a "strictErrorChecking" boolean 
option.
When enabled, exceptions were thrown when constraints were not met of such 
methods.
When disabled, no exception is thrown but a warning is emit and false is 
returned instead.
The DOM living spec no longer has that option and always uses exceptions.

In the new classes I would also only use exceptions and not include the 
strictErrorChecking option, as spec demands.
This cleans up return types.

For example: $doc->createElement("") should throw.
Or $element->setAttributeNode($attr) should throw when $attr is already used by 
another element.
Etc.

> 
> 1. Acting as a user-agent: in this case, errors should be handled as
> described in the spec for a user-agent, e.g., switching to Text-Mode
> in some cases and gobbling up the rest of the document.

The HTML 5 RFC follows the spec error recovery rules for user agents.

> 
> 2. Acting as a conformance checker: in this case, a list of errors
> should be available to the programmer instead of bailing when parsing
> (e.g., not switching to Text-Mode, but trying to continue parsing the
> document, as described in the parser spec for conformance checking).
> 
> 3. Acting as a document builder: Putting the document into an invalid
> state should emit at least a warning. However, it's likely better to
> let the user-agent handle the invalid DOM (as this is probably more
> forward-thinking for new HTML that currently doesn't exist). This is
> actually one of the biggest draw-backs to the current implementation
> as it requires a number of "hacks" to build valid HTML.

Kind regards
Niels

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to