[EMAIL PROTECTED] wrote:
I am a total newbie in perl

Hello and welcome.

I have a html file with some junk after </html?

So I am trying to clean it.

You might want the htmlclean program:

http://search.cpan.org/~lindner/HTML-Clean-0.8/bin/htmlclean

Or the HTML::Clean module:

http://search.cpan.org/~lindner/HTML-Clean-0.8/lib/HTML/Clean.pm


This is how I started out. Its inside a unix shell script so I must
test on a command line like this:

% cat file.html | perl -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i)
{ print $_ } }'

$ perl -MO=Deparse -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i) { print $_ } }'
LINE: while (defined($_ = <ARGV>)) {
    {
        $/ = '</HTML>';
        if ($_ =~ m[</html>]i) {
            print $_;
        }
    }
}
-e syntax OK

You are setting the Input Record Separator ($/) to "</HTML>" after the first line is read so the first line will never be printed. Also if the tag is not exactly '</HTML>' then it will not work. And you are using 'cat' when you don't need to. You probably want something like this:

perl -ne'print if 1 .. m[</html>]i' file.html


OK I wrote it by imitating other examples.

I dont know why I use switch -n . These are not described in man perl.
It only lists all switches in syntax line.

The command line switches are listed in perlrun:

perldoc perlrun

Or:

man perlrun



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to