Don Elliott wrote: > > Hi, > > I'm having some trouble trying to easily remove lines from a data file using > a regular expression. I can do it by reading the file in > a line at a time then deciding whether to chuck it or write it out. My data > looks something like this - > > ENQ:SIMS RE:ELLIOTT,DONALD > ELLIOTT,DONALD,LAWRENCE > - DOB 1963SEP30 SEX:M > 223 OREGAN CR SCORE:27 > BUSINESS: 306-975-8315 > RELATED EVENTS > GO0024158 1997APR18 COMPLAINANT CRIMINAL ACTIVIT > <<TAG:REPORT:GO 1997 0024158>> > GO0006897 1987FEB26 REG OWNER SEIZED VEHICLES > <<TAG:REPORT:GO 1987 0006897>> > AC0040436 2002MAY21 REG OWNER FAIL TO ST/REMAI > <<TAG:REPORT:AC 2002 0040436>> > AC0000072 1994JAN04 DRIVER IN NON FAT INJ ACC > <<TAG:REPORT:AC 1994 0000072>> > ---------------------------------------------------------------------- > <<TAG:REPORT:DATA Complicated multi-line > tags are possible. This really complicates > my parsing >> > MORE MATCHING PERSONS ON FILE > > What I need to do is to remove all of the 'tags' from the file > my best attempt so far has been > > $file_with_no_tags =~ s/<<TAG:.+>>//sig; > > which removes everything from the first '<<TAG:' to the last '>>' > > Is their a better way? (Actually any way that works would be better) > > at a different part in my program I need to collect all of the tags. > this is the code I use for that - > > my %tag_hash; > my @lines = split /\n/,$src; > my ($in_tag, $long_tag); > $in_tag = 'FALSE'; > foreach my $line (@lines) { > if ($line =~ /<<TAG.+>>/ims) { # tag is > contained in one line > my ($label,$tagname,$tagval) = split /:/,$line,3; > chop $tagval; #remove trailing > > chop $tagval; #remove trailing > > $tag_hash{$tagname} = $tagval; > } > elsif ($line =~ /<<TAG/i) { # start of a > multi-line tag > $in_tag = 'TRUE'; > $long_tag = $line; > } > elsif ($in_tag eq 'TRUE' and $line =~ />>/i) { # end of a > multi-line tag > $in_tag = 'FALSE'; > $long_tag = "$long_tag\n$line"; > my ($label,$tagname,$tagval) = split /:/,$long_tag,3; > chop $tagval; #remove trailing > > chop $tagval; #remove trailing > > $tag_hash{$tagname} = $tagval; > } > elsif ($in_tag eq 'TRUE') { #middle of a > multi-line tag > $long_tag = "$long_tag\n$line"; > } > } > > This strikes me as being a little long to do something this simple in perl. > > Can anyone point me in a better/shorter/more easily understood direction?
If you want shorter then this should do what you want: my %tag_hash; for my $tag ( $src =~ /<<TAG:(.+?)>>/isg ) { my ( $tagname, $tagval ) = split /:/, $tag, 2; $tag_hash{$tagname} = $tagval; } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]