On Thu, 5 May 2005, Jason Van Cleve wrote:
> Quoth Bob Miller, on Thu, 5 May 2005 18:11:06 -0700:
>
> > No, sed is precise with regard to newlines. (-: I was wondering if
>
> The man page isn't very. It doesn't mention "$" wrt regex's, just "\n":
>From `man 7 regex`:
An atom is a regular expression enclosed in `()' (matching a
match for the regular expression), an empty set of `()' (match-
ing the null string)(!), a bracket expression (see below), `.'
(matching any single character), `^' (matching the null string
at the beginning of a line), `$' (matching the null string at
the end of a line), a `\' followed by one of the characters
`^.[$()|*+?{\' (matching that character taken as an ordinary
character), a `\' followed by any other character(!) (matching
that character taken as an ordinary character, as if the `\' had
not been present(!)), or a single character with no other sig-
nificance (matching that character). A `{' followed by a char-
acter other than a digit is an ordinary character, not the
beginning of a bound(!). It is illegal to end an RE with `\'.
The "null string at the end of a line" means that the $ doesn't gobble
any characters in the matching line. Precise, tho not exactly
vernacular.
BTW, with some versions of sed you can use \r to specify the CR
character.
> "REGULAR EXPRESSIONS
> POSIX.2 BREs should be supported, but they aren't completely because of
> performance problems. The \n sequence in a regular expression matches
> the newline character, and similarly for \a, \t, and other sequences."
>
> All I could find on POSIX.2 BREs defines "$" in terms of "the end of a
> line":
>
> http://docsrv.sco.com:507/en/man/html.M/regexp.M.html
>
> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03
> _08
>
> I can only surmise sed is thus free to treat "^M" as part of the RE or
> as part of the "end of a line", and opts for the former.
No. Under Windoze ^M^J *is* a new line. Under *nix, ^J is newline.
So if you pass a ^M^J to Linux that ^M is seen as just another
character, and treated as such. If you want to change it to
some other character or string using sed, you are free to do so.
sed -e "s/^M/Bogus Newline/"
And this is how it should be. Because if sed accepted *either*
^M^J or ^J as a new line then how would we specifically handle
files that had a mix of these if we wanted to treat the two
differently?
Having programs try to do what they think you want rather than
what you specifically say may feel good when you are first
starting. But pretty soon then get in the way when you try
to do something difficult.
> Cheers,
> --Jason V. C.
> --
> Revolution from Below! GPL the Constitution!
The Constitution already is open source. Fortunately.
And most of the laws are. Exception: "patriot act".
The comment below describes Bill Gates and clan...
--
Allen Brown
work: Agilent Technologies non-work: http://www.peak.org/~abrown/
[EMAIL PROTECTED] [EMAIL PROTECTED]
Those who do not understand Unix are condemned to reinvent it, poorly.
--- Henry Spencer
_______________________________________________
EUGLUG mailing list
[email protected]
http://www.euglug.org/mailman/listinfo/euglug