On Jan 24, 2008 11:19 PM,  <[EMAIL PROTECTED]> wrote:
snip
> I have a html file with some junk after </html?
>
> So I am trying to clean it.
>
> This is how I started out. Its inside a unix shell script so I must
> test on a command line like this:
>
> % cat file.html | perl -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i)
> { print $_ } }'
>
>
> OK I wrote it by imitating other examples.
>
> I dont know why I use switch -n . These are not described in man perl.
> It only lists all switches in syntax line.
snip

This because they are documented in perlrun.

from man perl
           perlrun             Perl execution and options

Also, you should be using perldoc instead of man to view the Perl
documentation.  The man command works fine in a pinch, but perldoc is
more robust and is available on all systems Perl runs on (even ones
that don't have a man command).

The -n option creates a loop that iterates over every line* in the
input file(s).

snip
> It always miss the top line.
snip

Let's take a look at what Perl is seeing:

 perl -MO=Deparse -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i) { print $_ } }'

LINE: while (defined($_ = <ARGV>)) {
    {
        $/ = '</HTML>';
        if ($_ =~ m[</html>]i) {
            print $_;
        }
    }
}
-e syntax OK

>From this we can see that you are setting the record separator
variable inside the loop after the first line is read.  Obviously we
need to set the record separator before we start reading from the
file.  There a couple of ways of doing this, but the easiest from your
perspective is to use a BEGIN block to set $/ before the loop:

perl -ne 'BEGIN {$/="</HTML>"} print if m#</html>#i' file.html

* where line is defined as a sequence of characters ending with the
string** in $/
** $/ can hold some special values as well, see perldoc perlvar or
http://perldoc.perl.org/perlvar.html#$/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to