I had posed a question to the posix austin group related to this
and failed to report back to ast-developers

here is the relevant snippet, starting with a response from the group
and my comment

>> Maybe what you're confusing is the concept of unassigned Unicode
>> codepoints (a Unicode concept irrelevant to C/POSIX) and invalid
>> wchar_t values or illegal multibyte sequences (a C/POSIX concept). As
>> far as C/POSIX is concerned, a multibyte sequence is legal if and only
>> if it corresponds to a wchar_t value via mbrtowc, and conversely, a
>> wchar_t value is a valid character if and only if it corresponds to a
>> multibyte character via wcrtomb. These operations should be inverses;
>> in particular they should be defined on each other's ranges.
> 
> yes there is confusion started on some other threads which contained
> references to
>         int iswrune(wchar_t)
> which apparently tests for assigned codepoints
> 
> what you just pointed out it is exactly what is needed for the POSIX tr
> implementation -- basically that unassigned codepoints do not come into play

basically the only tools an application has for:
        valid multibyte sequence is mbrtowc()
        valid wchar_t is wcrtomb()
iswrune() is a concept outside the scope of posix
any posix standard command that produces error messages inconsistent with
mbrtowc() or wcrtomb(), e.g., via iswrune(), is non-conforming

On Wed, 5 Jun 2013 02:30:56 +0200 Cedric Blancher wrote:
> On 18 April 2013 13:38, Roland Mainz <[email protected]> wrote:
> > On Wed, Apr 17, 2013 at 2:52 PM, Cedric Blancher
> > <[email protected]> wrote:
> >> Glenn, can you take a look at the posting from freebsd-standards? AST
> >> tr -C doesn't ignore unassigned code points as it should be.
> > [snip]
> >
> > Grumpf... I think you're right...
> > ... the trouble is that not all platforms implement the |iswrune()|
> > function (see 
> > http://developer.apple.com/library/ios/#documentation/system/conceptual/manpages_iphoneos/man3/iswrune.3.html)
> > ...
> >
> > ... AFAIK (based on some testing on a FreeBSD system vs. Solaris) the
> > following |iswrune() emulation code should work (and we need a iffe
> > probe for |iswrune()| and fall-back to the emulation):
> > -- snip --
> > #include <stdlib.h>
> > #include <stdio.h>
> > #include <locale.h>
> > #include <wctype.h>
> >
> > static
> > int iswrune_emu(wint_t c)
> > {
> >         /*
> >          * we test |iswprint()| first because it has
> >          * usually the largest number of members and
> >          * the fastest implementation
> >          */
> >         if (iswprint(c))
> >                 return (1);
> >         if (iswalnum(c) ||
> >                 iswcntrl(c) ||
> >                 iswdigit(c) ||
> >                 iswgraph(c) ||
> >                 iswpunct(c) ||
> >                 iswspace(c) ||
> >                 iswxdigit(c) ||
> >                 iswblank(c) ||
> >                 iswlower(c) ||
> >                 iswupper(c))
> >                 return (1);
> >
> >         return (0);
> > }
> >
> > int main(int ac, char *av[])
> > {
> >         wint_t i;
> >
> >         setlocale(LC_ALL, "");
> >
> >         puts("#start.");
> >
> >         for (i=0x3000 ; i < 0x4000 ; i++)
> >         {
> >                 if (!iswrune_emu(i))
> >                 {
> >                         printf("code point %lx not assigned.\n",
> >                                 (long)i);
> >                 }
> >         }
> >
> >         puts("#done");
> >         return (EXIT_SUCCESS);
> > }
> > -- snip --
> > (note that |iswprint()| is explicitly seperated out to highlight the
> > performace optimisation)
> >
> > Erm... Glenn... what do you think ?
> >
> > ----
> >
> > Bye,
> > Roland
> >
> > P.S.: If we use the emulation then AST regex should (IMO0 still
> > support [:rune:] (through the emulation) ...

> Glenn, are you going to put this fix into AST tr for the next alpha?
> IMO filtering unassigned code points is required for a standard
> conforming tr -C implementation.

> Ced
> -- 
> Cedric Blancher <[email protected]>
> Institute Pasteur

_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to