[EMAIL PROTECTED] wrote:
I am a total newbie in perl
Hello and welcome.
I have a html file with some junk after </html?
So I am trying to clean it.
You might want the htmlclean program:
http://search.cpan.org/~lindner/HTML-Clean-0.8/bin/htmlclean
Or the HTML::Clean module:
http://search.cpan.org/~lindner/HTML-Clean-0.8/lib/HTML/Clean.pm
This is how I started out. Its inside a unix shell script so I must
test on a command line like this:
% cat file.html | perl -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i)
{ print $_ } }'
$ perl -MO=Deparse -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i) { print
$_ } }'
LINE: while (defined($_ = <ARGV>)) {
{
$/ = '</HTML>';
if ($_ =~ m[</html>]i) {
print $_;
}
}
}
-e syntax OK
You are setting the Input Record Separator ($/) to "</HTML>" after the
first line is read so the first line will never be printed. Also if the
tag is not exactly '</HTML>' then it will not work. And you are using
'cat' when you don't need to. You probably want something like this:
perl -ne'print if 1 .. m[</html>]i' file.html
OK I wrote it by imitating other examples.
I dont know why I use switch -n . These are not described in man perl.
It only lists all switches in syntax line.
The command line switches are listed in perlrun:
perldoc perlrun
Or:
man perlrun
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/