Re: Resolved: [Eug-lug] Source Code Processing

Allen Brown Thu, 05 May 2005 21:26:24 -0700

On Thu, 5 May 2005, Jason Van Cleve wrote:
> Quoth Bob Miller, on Thu, 5 May 2005 18:11:06 -0700:
> 
> > No, sed is precise with regard to newlines. (-: I was wondering if
> 
> The man page isn't very.  It doesn't mention "$" wrt regex's, just "\n":


>From `man 7 regex`:
       An atom is a regular expression enclosed  in  `()'  (matching  a
       match  for the regular expression), an empty set of `()' (match-
       ing the null string)(!), a bracket expression (see  below),  `.'
       (matching  any  single character), `^' (matching the null string
       at the beginning of a line), `$' (matching the  null  string  at
       the  end  of  a  line),  a `\' followed by one of the characters
       `^.[$()|*+?{\' (matching that character  taken  as  an  ordinary
       character),  a `\' followed by any other character(!)  (matching
       that character taken as an ordinary character, as if the `\' had
       not  been  present(!)), or a single character with no other sig-
       nificance (matching that character).  A `{' followed by a  char-
       acter  other  than  a  digit  is  an ordinary character, not the
       beginning of a bound(!).  It is illegal to end an RE with `\'.

The "null string at the end of a line" means that the $ doesn't gobble
any characters in the matching line.  Precise, tho not exactly
vernacular.

BTW, with some versions of sed you can use \r to specify the CR
character.

> "REGULAR EXPRESSIONS
> POSIX.2 BREs should be supported, but they aren't completely because of
> performance  problems.  The \n sequence in a regular expression matches
> the newline character, and similarly for \a, \t, and other sequences."
> 
> All I could find on POSIX.2 BREs defines "$" in terms of "the end of a
> line":
> 
> http://docsrv.sco.com:507/en/man/html.M/regexp.M.html
> 
> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03
> _08
> 
> I can only surmise sed is thus free to treat "^M" as part of the RE or
> as part of the "end of a line", and opts for the former.

No.  Under Windoze ^M^J *is* a new line.  Under *nix, ^J is newline.
So if you pass a ^M^J to Linux that ^M is seen as just another
character, and treated as such.  If you want to change it to
some other character or string using sed, you are free to do so.
  sed -e "s/^M/Bogus Newline/"

And this is how it should be.  Because if sed accepted *either*
^M^J or ^J as a new line then how would we specifically handle
files that had a mix of these if we wanted to treat the two
differently?

Having programs try to do what they think you want rather than
what you specifically say may feel good when you are first
starting.  But pretty soon then get in the way when you try
to do something difficult.

> Cheers,
> --Jason V. C.
> --
> Revolution from Below! GPL the Constitution!

The Constitution already is open source.  Fortunately.
And most of the laws are.  Exception: "patriot act".

The comment below describes Bill Gates and clan...
--
Allen Brown
  work: Agilent Technologies      non-work: http://www.peak.org/~abrown/
        [EMAIL PROTECTED]             [EMAIL PROTECTED]
  Those who do not understand Unix are condemned to reinvent it, poorly.
    --- Henry Spencer

_______________________________________________
EUGLUG mailing list
[email protected]
http://www.euglug.org/mailman/listinfo/euglug

Re: Resolved: [Eug-lug] Source Code Processing

Reply via email to