Re: [PHP-DEV] SimpleXML: Moving Forward

Adam Trachtenberg Tue, 13 Jan 2004 08:42:10 -0800

On Jan 13, 2004, at 9:21 AM, Christian Schneider wrote:

But let's take a look on how I'd use it (xml formatted for readability): $foo = simplexml_load_string(' <foo x:a="xa" y:a="ya"> ab <foo2>foo2a</foo2> cd <foo2>foo2b</foo2> ef <foo3> foo3 <foo4>foo4</foo4> foo3 </foo3> gh </foo>');

Ugh. This is pretty much the limit of what I think is reasonable for SimpleXML to handle. It think the API would be more consistent if the document looked like:

<foo x:a="xa" y:b="yb">
  <foo2>foo2a</foo2>
  <foo2>foo2b</foo2>
  <foo3>
  <foo4>foo4</foo4>
  </foo3>
</foo>');

However, that may be placing too many restrictions upon documents to make SimpleXML useful. Like I said before, I've never tried to use SimpleXML with text nodes and elements sharing the same parent.

foreach ($foo as $node) => foo2a foo2b foo3
foreach ($foo->foo2 as $node) => foo2a foo2b
foreach ($foo->foo3 as $node) => foo4
foreach ((array)$foo->foo3 as $node) => foo4
foreach ($foo->foo3->foo4 as $node) => nothing
foreach ((array)$foo->foo3->foo4 as $node) => foo4
What seems wrong here is that to output nodes where there can be 0 to multiple instances I have to do something like: if ($foo->$nodename) { if (is_array($foo->$nodename)) { foreach ($foo->$nodename as $node) echo "$node\n"; } else echo "{$foo->$nodename}\n"; } else echo "No node $nodename found\n";
$nodename = 'node1' => No node node1 found
$nodename = 'node2' => foo2a foo2b
$nodename = 'node3' => foo3

I raised this as an issue yesterday. Sterling said he'd look into this. However, to tie this into my reply to Rob, I think there's some expectation that the developer knows what she's getting and that the cases where you have 0, 1, or many potential elements are few. (That said, I just developed something where I do essentially this all over the place and it sucks.)

Here are my thoughts on solutions:

1) Place all elements in an array (or nodeList) regardless whether there's 0, 1, or many. This is the DOM solution. This just leads to annoying code where you need to do $foo->item(0) and $foo->firstChild.

However, I don't really see any way around this otherwise. Either it's general or not. It can't be both. (Unless there's some magical type that's both an array and a scalar.) I'm willing to put up with this headache because the klunkyness here is outweighted by the niceness for most cases.

2) If a document has an XML Schema (or RelaxNG schema), SimpleXML could optionally inspect the schema to see if there are minOccurs and maxOccurs attributes in the schema for an element. If maxOccurs > 1, then the elements would be placed in an array even if there was only one element in that particular instance.

This allows us to solve the problem by making the user specifically tell us how they want SimpleXML to handle a document. It does add some overhead, but simplicity is often more complex behind the scenes. This has the benefit of using an existing XML technology to solve the problem, but I don't know how expensive it'd be.

Again, my opinion is that arbitrary XML documents are best parsed using DOM and well-defined ones are best parsed using SimpleXML.

Attributes are handled associative arrays, so given an element with 2 attributes with the same name, but in different namespaces, it wont work: <foo a:bar="x" b:bar="y">
Right now foo['bar'] will be an array('x', 'y') in that case. We're losing the namespaces here but get the values. Simple or broken? Not sure.

This case still makes me puke. :)

Right now, SimpleXML always makes you lose the namespaces unless you use XPath. I don't think that's too much to ask that if you can handle XML Namespaces you can also handle XPath. I would prefer to guide people through XPath in these nasty cases than make the general API handle them.

As right now there is no easy (read non-xpath/xquery) way of getting the attributes hidden in the magic array of $foo I think getAttributes should be added too.

AFAIK, it's actually also impossible to find out the name of the document element using SimpleXML, even using XPath.

I ended up doing:

$xml = simplexml_load_string($data);
$type = dom_import_simplexml($xml)->tagName;

Without this feature, it's difficult to make SimpleXML work in cases where a page could be potentially processing two different XML documents because you can't inspect the XML document to figure out what type it is. :)

No other functions though. Should these be methods? I think so.
$foo = simplexml_load_string('<foo>ab<foo2>test</foo2>cd</foo>');
$ns = $foo->xsearch('child::text()');
> foreach ($ns as $node)
>   print "Node Value: ".$node."\n";
I would actually expect abcd but only once:
Node Value: abcd
Concatenating all text parts _and_ returning them once for each part definitely seems wrong.

Aren't those two lines contradictory? :)

+1 on getChildren/getAttributes (function or method)
-1 on more functions
I think it's quite usable this way and simple enough to use to earn the name SimpleXML.

I think this is where we're coming out. (Modulo the XPath and Validation functions.)

-adam

--
adam trachtenberg
[EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] SimpleXML: Moving Forward

Reply via email to