on 02/09/2013 17:54 Andriy Gapon said the following:
> 
> re_format(7) says:
>      There are two special cases‡ of bracket expressions: the bracket expres‐
>      sions ‘[[:<:]]’ and ‘[[:>:]]’ match the null string at the beginning and
>      end of a word respectively.  A word is defined as a sequence of word
>      characters which is neither preceded nor followed by word characters.  A
>      word character is an alnum character (as defined by ctype(3)) or an
>      underscore.  This is an extension, compatible with but not specified by
>      IEEE Std 1003.2 (“POSIX.2”), and should be used with caution in software
>      intended to be portable to other systems.
> 
> However I observe the following:
> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g'
> xx
> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g'
> cd1 xx
> 
> In my opinion '[[:<:]]' should not affect how the pattern is matched in this 
> case.

It seems that the code works like this:
- first it matches "cd0 " and "removes" it
- then it passes "cd1 xx" for matching with a flag that tells that this is not
  a real start of the string
- thus the matching code
 o knows that this is not a real line start, so it can't match [[:<:]]
   just for that reason
 o it does _not_ know what was the character before the start of the given
   substring, so it can not know if it could match [[:<:]]

So matching fails.
Not sure if this is an internal problem of regex(3) or a problem of how sed(1)
uses regex(3).

-- 
Andriy Gapon
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to