From: Kevin Viel <[EMAIL PROTECTED]> > I have obtain results of a query in XML format: > > <?xml version="1.0"?> > <!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29 > October 2004//EN" > "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd"> > <eSummaryResult> <DocSum> > <Id>4609</Id> > <Item Name="Name" Type="String">MYC</Item> > <Item Name="Description" Type="String">v-myc myelocytomatosis > > viral oncogene homolog (avian)</Item> > <Item Name="Orgname" Type="String">Homo sapiens</Item> > <Item Name="Status" Type="Integer">0</Item> > <Item Name="CurrentID" Type="Integer">0</Item> > <Item Name="Chromosome" Type="String">8</Item> > <Item Name="GeneticSource" Type="String">genomic</Item> > <Item Name="MapLocation" Type="String">8q24.12-q24.13</Item> > <Item Name="OtherAliases" Type="String">c-Myc</Item> <Item > Name="OtherDesignations" Type="String">avian > myelocytomatosis viral oncogene homolog|myc proto-oncogene > protein|v-myc avian myelocytomatosis viral oncogene homolog</Item> > <Item Name="NomenclatureSymbol" Type="String">MYC</Item> > <Item Name="NomenclatureName" Type="String">v-myc > myelocytomatosis viral oncogene homolog (avian)</Item> > <Item Name="NomenclatureStatus" Type="String">Official</Item> > <Item Name="TaxID" Type="Integer">9606</Item> <Item > Name="Mim" Type="List"> > <Item Name="int" Type="Integer">190080</Item> > </Item> > </DocSum> > > > I would like search for certain keywords and abstract all gene in this > query that meet the criteria. Can someone recommend a module? I > looked at XML::Simple::DTDReader.
Yeah, the module looks fine. There are of course many options. One being XML::Rules. Assuming the <DocSum> is being repeated and you want to do something with only some of those you could use something like: use XML::Rules; my $parser = XML::Rules->new( rules => [ Id => 'content', Item => sub {$_[1]->{Name} => $_[1]->{_content}}, # from the <Item> tags we are interested in the content # and want to use the Name attribute as the key to access # that value. We ignore the Type attribute. DocSum => sub { # by now all the data from the <Item>s are in the %{$_[1]} hash if ($_[1]->{Chromosome} != 8 or $_[1]->{NomenclatureName} !~ /\bviral\b/) { # ignore everything outside the 8th chromosome that's not 'viral' return; } # do something with the data # or return the part of the data you want to keep using whatever # you suits you best as the key return $_[1]->{Name} => $_[1]; }, eSummaryResult => 'pass no content', ] ); my $data = $parser->parse($the_xml_or_file); print $data->{MYC}{NomenclatureName}, "\n"; __END__ HTH, Jenda ===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz ===== When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/