Re: XML::LibXML navigation

Rob Dixon Fri, 12 Jan 2007 10:29:36 -0800

Beginner wrote:
> On 12 Jan 2007 at 17:06, Rob Dixon wrote:
>> Beginner wrote:
>>>
>>> In the sample date below your'll see some addresses with "DO NOT..."
>>> in. I can locate them easily enough but I am struggling to navigate
>>> back up the DOM to access the code so I can record the code with
>>> faulty addresses.
>>>
>>> Here my effort. Can anyone help me either to move backup up to the
>>> right element node or catch the code node before I begin to loop
>>> through the address line(s).
>>>
>>>
>>> ======= My Effort ==========
> #!/usr/bin/perl
>
>
> my $file = 'ADDRESS.XML';
> open(FH,$file) or die "Can't open file $file: $!\n";
>
> my $parser = XML::LibXML->new;
> my $doc = $parser->parse_fh(\*FH);
>
> my @codes = $doc->findnodes('//code');
> my @lines = $doc->findnodes('//lines');


It's never a good idea to use the double-slash unless you really need it, as it
forces the XPath engine to search through the whole of the data for a matching
node name. If you are working with an awfully designed data structure and you
really have no idea where the nodes will appear then fine, but in this case you
can tell the software exactly where to look with

my @codes = $doc->findnodes('/dataroot/address/code');
my @lines = $doc->findnodes('/dataroot/address/lines');

> for (my $i = 0; $i < $#codes; ++$i) {
>   #print $codes[$i]->string_value, "\t";
>   my @add = $lines[$i]->childNodes;
>   for ( my $a = 1; $a <$#add; ++$a) {
>     if ($add[$a]->string_value =~ /\s+NOT\s+/) {
>       print $codes[$i]->string_value,": ",$add[$a]-> string_value,"\n";
>     }
>   }
> }

This will probably work, but only coincidentally! You're relying on the
elements in @codes and @lines arrays being paired exactly, which will be the
case only if all of the <address> nodes contains exactly one <code> element and
exactly one <lines> element. This may well be the case, but isn't something you
should be assuming.

Your code also does the same as mine, except that you print the address line
that was found to contain /\s+NOT\s+/ as well as the code of the address in
which it was found.

>> If I understand you correctly then all you need is
>>

>> my @results = $doc->findnodes('/dataroot/address[contains(lines/line, "DONOT USE")]');

>>
>> foreach my $address (@results) {
>>    my $code = $address->findvalue('code');
>>    print $code, "\n";
>> }
>>

>> which prints the code of all those addresses that have a line containing 'DONOT

>> USE'. Is that what was required?
>>
>
> Yes ...and no. I guess I want to print out the 'code' for any address
> so that I can get the data corrected but I guess I would also like to
> remove those records at the /dataroot/address level so they don't
> appear in the file.

You mean you want to produce a modified version of the original file with the
flagged address elements removed? The you want XSLT, not Perl!

> i spent a lot of time on this today as this look like a excellent
> parser and DOM navigator but I struggled moving around.

It is. I'm very impressed myself.

> In your example @results looks like it would contain references to
> all the /lines/line data with DO NOT USE in the string_value.

No. The XPath expression

  /dataroot/address[contains(lines/line, "DO NOT USE")]

indicates all <address> elements that have at least one <line> element
containing the string "DO NOT USE".

> What I have struggling with is that this is also a reference to the record as
> a whole and my navigation techniques are not working out. For  example
> whenever I used findnodes I was getting every code in the  file. I think now
> that was because I was using /dataroot/address as  the starting point.

I'm not sure what you mean. In my code @results is a list of all marked
<address> nodes. Which nodes are found byt the findnodes method depends on what
the current context node is, so $doc->findnodes('//code') will return all of the
<code> elements in the data, but (in my code) $address->findnodes('//code')
would return all of the <code> elements within that address. I have used
$address->findvalue('code') because I want the text value of the node and I also
want to look for a <code> child of the <address> node instead of any <code>
descendant.

> Aside from CPAN, I would appreciate any other sources of info about
> using the using libXML with perl and xpath expressions. It is
> whoppingly fast.

If you're doing a lot of XML work then I wholeheartedly recommend O'Reilly's
volumes

XML in a Nutshell, Third Edition
XPath and XPointer
XSLT

HTH,

Rob

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: XML::LibXML navigation

Reply via email to