On 3/25/09, Glenn Fowler <[email protected]> wrote:
>
>  On Wed, 25 Mar 2009 17:08:11 +0100 Jennifer Pioch wrote:
>  > On 3/24/09, Glenn Fowler <[email protected]> wrote:
>  > >
>  > >  here's what the regexp page says:
>  > >
>  > >   The Simple Regular Expressions described below differ from the
>  > >   Internationalized Regular Expressions described on the regex(5) manual
>  > >   page in the following ways:
>  > >
>  > >     * only Basic Regular Expressions are supported
>  > >     * the Internationalization features--character class, equivalence 
> class,
>  > >       and multi-character collation--are not supported.
>  > >
>  > >  if these are indeed the only differences then I can add a REG_NOI18N
>  > >  regcomp() flag -- but I need verification of exactly what that means
>  > >  does that mean that it is byte based, or does . match a multibyte char?
>
>  > It supports and matches multibyte characters, only supports Basic
>  > Regular Expressions and does not support the extended set of character
>  > *classes*.
>  > The name REG_NOI18N would be misleading, its better to call it
>  > REG_REGEXP (basic regexp).
>
>
> I was concerned about multibyte because the regexp text refers to byte
>  instead of character in some places

Solaris regexp matches multibyte characters, but a REG_BINARY option
to match in singlebyte characters may be useful for matching binary
data.

>  can you verify how the regexp grep works in the C and multibyte locales
>  try with a file that has one line with one multibyte char
>  and try this pattern
>         '^.$'

$ printf "a\nä\nb\n" | /usr/bin/grep '^.$'
a
ä
b

Jenny
-- 
Jennifer Pioch, Uni Frankfurt

_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users

Reply via email to