From: Kevin Viel <[EMAIL PROTECTED]> > Jenda Krynicky kindly provided: > > > use XML::Rules; > > > > my $parser = XML::Rules->new( > > rules => [ > > Id => 'content', > > Item => sub {$_[1]->{Name} => $_[1]->{_content}}, > > # from the <Item> tags we are interested in the content > > # and want to use the Name attribute as the key to access > > # that value. We ignore the Type attribute. > > DocSum => sub { > > # by now all the data from the <Item>s are in the %{$_[1]} hash > > > > if ($_[1]->{Chromosome} != 8 > > or $_[1]->{NomenclatureName} !~ /\bviral\b/) { > > # ignore everything outside the 8th chromosome that's not > > # 'viral' > > return; > > } > > > > # do something with the data > > # or return the part of the data you want to keep using whatever > > # you suits you best as the key > > return $_[1]->{Name} => $_[1]; > > }, > > eSummaryResult => 'pass no content', > > ] > > ); > > > > my $data = $parser->parse($the_xml_or_file); > > > > print $data->{MYC}{NomenclatureName}, "\n"; > > __END__ > > I'd like to understand this better. It seems to be a reference > (little arrow). Is that the same as using /@referenced_array, for > instance?
Assuming you use the code above as is you end up with a reference to a HoH in $data. The first level of keys will be the Names of the genes (or whatever's the content of the <DocSum> tags), the second level will be the values of the Name attributes from the <Item> tags. You may want to run the script on a short XML and print the returned data structure by use Data::Dumper; print Dumper($data); > It seems to be a hash with the key "rules" and a four-item array as > its value. The third item of this array is a hash with a subroutine, > or anonymous function declaration, as its value. The constructor of the XML::Rules object accepts several named arguments, the most important being "rules". it's either a reference to an array or hash containig the "rules" to apply to the tags read from the XML. Whenever a tag is fully parsed (including the </closing tag>!) the module calls the specified subroutine (or builtin) to massage/filter/process the data from the tag. Whatever the subroutine returns is then made available to the subroutine specified for the parent tag. > I am wrong, correct? > > A) Correct, you were incorrect. > B) Incorrect, you were correct. > C) You're still buying beer. > > To start with specific questions, could someone explain: > > > Item => sub {$_[1]->{Name} => $_[1]->{_content}} In this particular case whenever the <Item ....>...</Item> is fully parsed this subroutine is called. It ignores the Type attribute and returns just the value of the Name attribute and the tag content in such a way that the first becomes a key and the later the value in the attribute hash of the parent tag, in this case <DocSum>. Later on, once the </DocSum> closing tag is parser all the values from all the <Item> tags within that <DocSum> will be available in the subroutine specified for the <DocSum> tag in the hash referenced by $_[1] like this: $_[1]->{Name} # the value will be "MYC" $_[1]->{Description} # = "v-myc myelocytomatosis viral oncogene homolog (avian)" etc. HTH, Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/