Greedy Regular Expression

Elliott, Don (Police) Fri, 26 Jul 2002 17:01:24 -0700

Hi,

I'm having some trouble trying to easily remove lines from a data file using
a regular expression. I can do it by reading the file in
a line at a time then deciding whether to chuck it or write it out. My data
looks something like this -


ENQ:SIMS RE:ELLIOTT,DONALD
ELLIOTT,DONALD,LAWRENCE
 - DOB 1963SEP30 SEX:M
   223 OREGAN CR             SCORE:27
  BUSINESS: 306-975-8315
   RELATED EVENTS
GO0024158 1997APR18 COMPLAINANT CRIMINAL ACTIVIT
<<TAG:REPORT:GO 1997 0024158>>
GO0006897 1987FEB26 REG OWNER   SEIZED VEHICLES
<<TAG:REPORT:GO 1987 0006897>>
AC0040436 2002MAY21 REG OWNER   FAIL TO ST/REMAI
<<TAG:REPORT:AC 2002 0040436>>
AC0000072 1994JAN04 DRIVER IN   NON FAT INJ ACC
<<TAG:REPORT:AC 1994 0000072>>
----------------------------------------------------------------------
<<TAG:REPORT:DATA Complicated multi-line
tags are possible. This really complicates
my parsing >>
MORE MATCHING PERSONS ON FILE

What I need to do is to remove all of the 'tags' from the file
my best attempt so far has been

$file_with_no_tags =~ s/<<TAG:.+>>//sig;

which removes everything from the first '<<TAG:' to the last '>>'

Is their a better way? (Actually any way that works would be better)

at a different part in my program I need to collect all of the tags.
this is the code I use for that - 

    my %tag_hash;
    my @lines = split /\n/,$src;
    my ($in_tag, $long_tag);
    $in_tag = 'FALSE';
    foreach my $line (@lines) {
        if ($line =~ /<<TAG.+>>/ims) {                       # tag is
contained in one line
            my ($label,$tagname,$tagval) = split /:/,$line,3;
            chop $tagval;    #remove trailing >
            chop $tagval;    #remove trailing >
            $tag_hash{$tagname} = $tagval;
        }
        elsif ($line =~ /<<TAG/i) {                            # start of  a
multi-line tag
            $in_tag = 'TRUE';
            $long_tag = $line;
        }
        elsif ($in_tag eq 'TRUE' and $line =~ />>/i) {   # end of a
multi-line tag
            $in_tag = 'FALSE';
            $long_tag = "$long_tag\n$line";
            my ($label,$tagname,$tagval) = split /:/,$long_tag,3;
            chop $tagval;    #remove trailing >
            chop $tagval;    #remove trailing >
            $tag_hash{$tagname} = $tagval;
        }
        elsif ($in_tag eq 'TRUE') {                            #middle of a
multi-line tag
            $long_tag = "$long_tag\n$line";
        }
    }

This strikes me as being a little long to do something this simple in perl.

Can anyone point me in a better/shorter/more easily understood direction?

Don 


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Greedy Regular Expression

Reply via email to