Re: Question about XML::Simple [was: XML to inMemory Hash]

Jenda Krynicky Thu, 02 Aug 2007 01:59:23 -0700

From: Rob Dixon <[EMAIL PROTECTED]>
> Thomas Polnik wrote:
> >
> >> Almost anything is better than XML::Simple, but no module can
> >> easily make your data any smaller.
> >
> > I use XML::Simple without any problems since some years. Which
> > problems could I get with with this package? My programm converts
> > many small xml-files (<100kb) to a perl structure daily. Until now I
> > got no errors or warnings, if the xml file is ok. XML::Simple is for
> > me a very simple way to get easy access to the xml data. (but I never
> > tried it with huge xml files.)
> 
> Once you have a working application with XML::Simple it will probably
> continue to work. But it uses an internal representation of XML data
> which is not a proper model, in particular:
> 
> - It doesn't preserve a distinction between attributes and child elements


Which makes the handling simpler and only ever matters if you intend 
to tweak the data and export it in the more or less same format.
 
> - It doesn't preserve the order of XML elements within a document

This is not exactly true. It doesn't preserve the order of different 
elements within one parent (which IMHO for data oriented XML should 
not matter at all and noone in their right mind would use XML::Simple 
for document oriented XML), but it does preserver the order of the 
same elements within one parent. Which is not only place where IMHO 
the order might matter. 

> - It generally needs the judicious use of the ForceArray and ForceContent
> options to get your XML data represented usefully.

When using some DOM or other object maze you likewise have to know 
which tags are to be expected to repeat and handle them differently 
than the ones that don't. And it's a question whether it's better to 
get a "Can't use an array reference as a hash reference" or silently 
ignore all but the first occurrence in case a tag you did not expect 
to be repeated is. I think the first is more in line with the XML 
philosophy. 

> - There is no single consistent internal representation of an XML document,
> and you have to know a lot about the content of the XML and the options
> used when parsing it before you can process the internal data

You usually need to know just as much no matter what parser/style you 
use. Of course in some cases the ability to ask for a node that's 
"somewhere there, god knows where" you get from XPath and similar 
technologies is nice, but if you need to process most of the data in 
the XML you need to know the structure just as well, no matter 
whether you use XML::Simple, a DOM based parser or anything else.

> In addition it generates a Perl structure of heavily nested arrays and hashes
> which prevents you from accessing the data in ways usually used with XML, and
> makes it difficult to write concise code

You mean

$root->first_child('bleargh')->first_child_text('foo')

is more concise than

$root->{bleargh}[0]{foo}{content}

?

You may get a simpler datastructure from XML::Rules as it gives you 
more detailed control regarding what and how to keep from the tags, 
but generally I do find the datastructure provided by XML::Simple 
much easier to work with than the maze of objects you get from DOM 
based modules.

> However to my knowledge it does what the documentation describes and if you
> have a working application then that is fine, although in my opinion it will
> be less concise, less readable and less maintainable than something written
> with a different module.
>
> Rob

I seriously doubt it. Unless you're only parsing the XML for one 
snippet of data hidden somewhere in the muddle.

Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Question about XML::Simple [was: XML to inMemory Hash]

Reply via email to