On Jan 24, 2008 11:19 PM, <[EMAIL PROTECTED]> wrote: snip > I have a html file with some junk after </html? > > So I am trying to clean it. > > This is how I started out. Its inside a unix shell script so I must > test on a command line like this: > > % cat file.html | perl -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i) > { print $_ } }' > > > OK I wrote it by imitating other examples. > > I dont know why I use switch -n . These are not described in man perl. > It only lists all switches in syntax line. snip
This because they are documented in perlrun. from man perl perlrun Perl execution and options Also, you should be using perldoc instead of man to view the Perl documentation. The man command works fine in a pinch, but perldoc is more robust and is available on all systems Perl runs on (even ones that don't have a man command). The -n option creates a loop that iterates over every line* in the input file(s). snip > It always miss the top line. snip Let's take a look at what Perl is seeing: perl -MO=Deparse -ne '{$/="</HTML>" ; if ($_ =~ m#</html>#i) { print $_ } }' LINE: while (defined($_ = <ARGV>)) { { $/ = '</HTML>'; if ($_ =~ m[</html>]i) { print $_; } } } -e syntax OK >From this we can see that you are setting the record separator variable inside the loop after the first line is read. Obviously we need to set the record separator before we start reading from the file. There a couple of ways of doing this, but the easiest from your perspective is to use a BEGIN block to set $/ before the loop: perl -ne 'BEGIN {$/="</HTML>"} print if m#</html>#i' file.html * where line is defined as a sequence of characters ending with the string** in $/ ** $/ can hold some special values as well, see perldoc perlvar or http://perldoc.perl.org/perlvar.html#$/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/