On 14 August 2023 13:40:40 BST, Niels Dossche <dossche.ni...@gmail.com> wrote: >And you load it into simpleXML, the result of calling json_encode($the_simplexml_object)

My usual reaction to this is "why would you take an object designed for accessing parts of an XML document, and serialise it to JSON?" Often, the answer turns out to be "because I don't understand SimpleXML objects, and have copied and pasted a weird hack to get a less useful array representation by round-tripping to JSON".

On the other hand, the fact that the *debug* representation of SimpleXML objects misses out some parts causes a lot of confusion, and I've actually considered the *opposite* of what you suggest - leave the JSON alone, because people will have written production code based on it, but make the debug array more descriptive of how to use the object.

Either way, the challenge is coming up with something that's concise for simple structures, but comprehensive for more complex ones, particularly if you want it to be consistent. For instance:

- Do you assume tag names are unique within a parent, so use key=>value directly; or assume they're not, so use key=>[list,of,values]; or dynamically switch between the two? - Do you care about the order of elements with different names, or prefer to group by name? - Do you have any elements with both child tags and text, or attributes and text, or all three? - Do you need to retain the order of text in relation to child elements (important for markup languages like HTML or DocBook)? Or is it enough to have a representation of "all text content" (the behaviour of SimpleXML's string cast)? - Do you have any elements with namespaces? If so, do you want to use local prefixes (and include the xmlns attributes somewhere), or repeat the full namespace URI?

There's a reason why both the DOM and SimpleXML provide object-oriented APIs for accessing the document, not a representation flattened to native types, and why both APIs are useful for different jobs - XML just isn't designed for flattening, and different patterns make sense for different documents / use cases.

Ultimately, I'm not that interested in trying to come up with a JSON or array representation that covers every possibility, because I think the only consistent answer would be horribly verbose - basically, describe every property that DOM would expose on each node.

For debug output, the main concern is showing what you'll get with various styles of access in SimpleXML, so a single "@text" => "foobarbaz" would make sense; or maybe even "(string)" => "foobarbaz" and rename "@attributes" to "->attributes()"

Regards,

--

Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to