Re: Matching Over Multiple Lines

Jim Gibson Tue, 17 Mar 2009 09:39:50 -0700

On 3/17/09 Tue  Mar 17, 2009  8:45 AM, "Jeff Westman" <westf...@gmail.com>
scribbled:


> All,
> 
> I know this has been asked many times, and I have read the documentation
> ("perldoc -q "matching over more than one line") and still can't make head
> or tails out of this.
> 
> I have a problem where my pattern can be in one line, or span multiple
> lines.  This is what I have so far (simplified):
> 
> 
> #!/bin/perl
> use warnings;
> use strict;
> $/ = '';
> my $pat = << 'EOF';
> ^
>     AAA
>     (.*)?
>     ZZZ
> $
> EOF
> my $file = "./mytext";
> while (<DATA>) {
>     if ( /$pat/gms ) {
>        print "vvvvvvvvvvvvvvvvvv\n";
>        print "FOUND: $_";
>        print "^^^^^^^^^^^^^^^^^^\n";
>     }
>     else {
>        print "vvvvvvvvvvvvvvvvvv\n";
>        print "NOT FOUND: $_";
>        print "^^^^^^^^^^^^^^^^^^\n";
>     }
> }
> __DATA__
> AAAthis is the beginning of our text
> and it will continue over a few lines.
> In case you are not sure of what you
> see, you should check the document
> yourself.ZZZ
> This part has nothing to do whatsoever
> with the above text, but to be sure,
> you should not see this.
> AAABut this single line you should seeZZZ
> AAABut this double line, so the
> question is do you see itZZZ
> This part you will not see.
> 
> 
> I am not sure if I can use $/ to gobble up a paragraph, since we are reading
> and parsing XML files, which are around 10M in size.  I need to do a
> non-greedy pattern match.
> 
> 
> Can someone tell me what I am doing wrong please?

You are trying to use regular expressions to parse XML. You  should use an
XML parser for that job. That is what they are designed for. There are many
traps in parsing XML, and you do not need to overcome each one, because it
has already been done for you. I have used XML::Parser, XML::Simple, and
XML::Twig successfully in the past.

As far as why your program is not working on your test data, it is because
you are attempting to match only one line at a time. In order to match
against multiple lines, you need to have multiple lines in your scalar
string variable.

The easiest way to do this is read the entire file at once:

    my $file = do{ local $/; <DATA>}

A file of length 10MB should be no problem for Perl.

If you can't read the entire file at once, then you can code up a scheme
that looks for the beginning token (AAA) and the ending token (ZZZ) in
separate regexs and saves the lines in between.



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Matching Over Multiple Lines

Reply via email to