Lightning flashed, thunder crashed and "Rajeev Rumale" <[EMAIL PROTECTED]> whi
spered:
| I am attaching both perl code and html file(zipped together).  Please check
| out and let me
| know where I am wrong.

In the future, please don't use attachments (especially binary formats like
zip) for this kind of thing.  It is so much easier to deal with when it is
just inlined ASCII.

Here's your original code, minus a few extra blank lines:
1: # pattern matching and extraction.
2: 
3: open (FILE, "test.htm");
4: 
5: @lines = <FILE>;
6: 
7: $text = join("",@lines);
8: 
9: $text=~s/\n/%%new_line%%/g;  # the script does not work correctly if commented.
10:print "found" if $text=~m/%%row_start%%(.+)%%row_end%%/;
11:
12:$tstart ="%%row_start%%";
13:$tend = "%%row_end%%";
14:
15:$text=~/(($tstart)(.+)($tend))/;
16:
17:print "The complete pattern is :\n $1\n\n";
18:print "The string without delimiters is :\n $3\n\n";


I'll go ahead and do a complete code review for you, even though you are
only asking a specific question.

First of all, in line 3 you aren't checking your return code, which you
should always do.  In lines 5-7, you read the file into an array, then join
the array into a scalar.  This is taking at least twice the amount of
memory you need.  If this happened to be a large file you were working
with, this could severely impact the performance (possibly killing the
script if you ran out of memory).

Next you do the same match twice, once with the delimiters inlined and once
with them set as variables.  You then save several chunks of information
that aren't particularly needed.

Have you checked out the "s" modifier for the s/// function?  From perlre:

       s   Treat string as single line.  That is, change "." to
           match any character whatsoever, even a newline, which
           normally it would not match.

FInally you print the answers.  Here's my rewrite of your code, taking
advantage of several of perl's strengths:

1: # pattern matching and extraction.
2:
3: open FILE, "test.htm" or die "Can't open test.htm: $!\n";
4: $/ = undef;
5: $text = <FILE>;
6: close FILE;
7:
8: if ($text =~ /%%row_start%%(.+)%%row_end%%/s) {
9:      print "found\n\n";
10:     print "The complete pattern is :\n $&\n\n";
11:     print "The string without delimiters is :\n $1\n\n";
12:}

As you can see, I've saved six lines of code and a large chunk of memory.
On line 3, we check the status of the open.  Sure, in this case, it didn't
particularly matter.  But it is always best to check.  Line for sets the
input record separator to "undef".  Doing this causes line 5 to read the
entire file into a single string.  Line 6 closes the filehandle.  Perl
automatically closes all opened filehandles when the script shuts down, but
it is "polite" to clean up after yourself.

Line 8 checks to see if $text contains the two delimiters we're looking
for.  Note the /s at the end to treat the string as a single line, ignoring
the newline characters during the match.  Also note that we save the
formatted information that we want, and nothing else.  This is, again, a
memory saver.  Line 10 prints $&, a special variable that contains the most
recently matched pattern.  Line 11 prints $1, the code we saved in line 8.

-spp
--
Stephen P Potter                                         [EMAIL PROTECTED]
"You can't just magically invoke Larry and expect that to prove your point.
Or prove that you have a point."        -Simon Cozens
UNIX and Perl Consulting and Training         http://www.unixlabs.net/~spp/

Reply via email to