Re: parsing XML

Jenda Krynicky Thu, 25 Jan 2007 07:11:14 -0800

From: Kevin Viel <[EMAIL PROTECTED]>
> I have obtain results of a query in XML format:
> 
> <?xml version="1.0"?>
> <!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29
> October 2004//EN"
> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd";>
> <eSummaryResult> <DocSum>
>          <Id>4609</Id>
>          <Item Name="Name" Type="String">MYC</Item>
>          <Item Name="Description" Type="String">v-myc myelocytomatosis
>          
> viral oncogene homolog (avian)</Item>
>          <Item Name="Orgname" Type="String">Homo sapiens</Item>
>          <Item Name="Status" Type="Integer">0</Item>
>          <Item Name="CurrentID" Type="Integer">0</Item>
>          <Item Name="Chromosome" Type="String">8</Item>
>          <Item Name="GeneticSource" Type="String">genomic</Item>
>          <Item Name="MapLocation" Type="String">8q24.12-q24.13</Item>
>          <Item Name="OtherAliases" Type="String">c-Myc</Item> <Item
>          Name="OtherDesignations" Type="String">avian 
> myelocytomatosis viral oncogene homolog|myc proto-oncogene
> protein|v-myc avian myelocytomatosis viral oncogene homolog</Item>
>          <Item Name="NomenclatureSymbol" Type="String">MYC</Item>
>          <Item Name="NomenclatureName" Type="String">v-myc 
> myelocytomatosis viral oncogene homolog (avian)</Item>
>          <Item Name="NomenclatureStatus" Type="String">Official</Item>
>          <Item Name="TaxID" Type="Integer">9606</Item> <Item
>          Name="Mim" Type="List">
>                  <Item Name="int" Type="Integer">190080</Item>
>          </Item>
> </DocSum>
> 
> 
> I would like search for certain keywords and abstract all gene in this
> query that meet the criteria.  Can someone recommend a module?  I
> looked at XML::Simple::DTDReader.


Yeah, the module looks fine. There are of course many options. One 
being XML::Rules. Assuming the <DocSum> is being repeated and you 
want to do something with only some of those you could use something 
like:

use XML::Rules;

my $parser = XML::Rules->new(
 rules => [
  Id => 'content',
  Item => sub {$_[1]->{Name} => $_[1]->{_content}},
   # from the <Item> tags we are interested in the content 
   # and want to use the Name attribute as the key to access 
   # that value. We ignore the Type attribute.
  DocSum => sub {
   # by now all the data from the <Item>s are in the %{$_[1]} hash

   if ($_[1]->{Chromosome} != 8 
   or $_[1]->{NomenclatureName} !~ /\bviral\b/) {
    # ignore everything outside the 8th chromosome that's not 'viral'
    return;
   }

   # do something with the data
   # or return the part of the data you want to keep using whatever
   # you suits you best as the key
   return $_[1]->{Name} => $_[1];
  },
  eSummaryResult => 'pass no content',
 ]
);

my $data = $parser->parse($the_xml_or_file);

print $data->{MYC}{NomenclatureName}, "\n";
__END__

HTH, Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: parsing XML

Reply via email to