Re: parsing XML

Jenda Krynicky Sat, 27 Jan 2007 10:06:24 -0800

From: Kevin Viel <[EMAIL PROTECTED]>
> Jenda Krynicky kindly provided:
> 
> > use XML::Rules;
> > 
> > my $parser = XML::Rules->new(
> >  rules => [
> >   Id => 'content',
> >   Item => sub {$_[1]->{Name} => $_[1]->{_content}},
> >    # from the <Item> tags we are interested in the content 
> >    # and want to use the Name attribute as the key to access 
> >    # that value. We ignore the Type attribute.
> >   DocSum => sub {
> >    # by now all the data from the <Item>s are in the %{$_[1]} hash
> > 
> >    if ($_[1]->{Chromosome} != 8 
> >    or $_[1]->{NomenclatureName} !~ /\bviral\b/) {
> >     # ignore everything outside the 8th chromosome that's not
> >     # 'viral'
> >     return;
> >    }
> > 
> >    # do something with the data
> >    # or return the part of the data you want to keep using whatever
> >    # you suits you best as the key
> >    return $_[1]->{Name} => $_[1];
> >   },
> >   eSummaryResult => 'pass no content',
> >  ]
> > );
> > 
> > my $data = $parser->parse($the_xml_or_file);
> > 
> > print $data->{MYC}{NomenclatureName}, "\n";
> > __END__
> 
> I'd like to understand this better.  It seems to be a reference
> (little arrow).  Is that the same as using /@referenced_array, for
> instance?


Assuming you use the code above as is you end up with a reference to 
a HoH in $data. The first level of keys will be the Names of the 
genes (or whatever's the content of the <DocSum> tags), the second 
level will be the values of the Name attributes from the <Item> tags.

You may want to run the script on a short XML and print the returned 
data structure by

use Data::Dumper;
print Dumper($data);

> It seems to be a hash with the key "rules" and a four-item array as
> its value.  The third item of this array is a hash with a subroutine,
> or anonymous function declaration, as its value.

The constructor of the XML::Rules object accepts several named 
arguments, the most important being "rules". it's either a reference 
to an array or hash containig the "rules" to apply to the tags read 
from the XML. Whenever a tag is fully parsed (including the </closing 
tag>!) the module calls the specified subroutine (or builtin) to 
massage/filter/process the data from the tag. Whatever the subroutine 
returns is then made available to the subroutine specified for the 
parent tag.

> I am wrong, correct?
> 
>    A) Correct, you were incorrect.
>    B) Incorrect, you were correct.
>    C) You're still buying beer.
> 
> To start with specific questions, could someone explain:
> 
>  >   Item => sub {$_[1]->{Name} => $_[1]->{_content}}

In this particular case whenever the <Item ....>...</Item> is fully 
parsed this subroutine is called. It ignores the Type attribute and 
returns just the value of the Name attribute and the tag content in 
such a way that the first becomes a key and the later the value in 
the attribute hash of the parent tag, in this case <DocSum>.
Later on, once the </DocSum> closing tag is parser all the values 
from all the <Item> tags within that <DocSum> will be available in 
the subroutine specified for the <DocSum> tag in the hash referenced 
by $_[1] like this:

 $_[1]->{Name} # the value will be "MYC"
 $_[1]->{Description} # = "v-myc myelocytomatosis viral oncogene 
homolog (avian)"

etc.

HTH, Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing XML

Reply via email to