I had posed a question to the posix austin group related to this
and failed to report back to ast-developers
here is the relevant snippet, starting with a response from the group
and my comment
>> Maybe what you're confusing is the concept of unassigned Unicode
>> codepoints (a Unicode concept irrelevant to C/POSIX) and invalid
>> wchar_t values or illegal multibyte sequences (a C/POSIX concept). As
>> far as C/POSIX is concerned, a multibyte sequence is legal if and only
>> if it corresponds to a wchar_t value via mbrtowc, and conversely, a
>> wchar_t value is a valid character if and only if it corresponds to a
>> multibyte character via wcrtomb. These operations should be inverses;
>> in particular they should be defined on each other's ranges.
>
> yes there is confusion started on some other threads which contained
> references to
> int iswrune(wchar_t)
> which apparently tests for assigned codepoints
>
> what you just pointed out it is exactly what is needed for the POSIX tr
> implementation -- basically that unassigned codepoints do not come into play
basically the only tools an application has for:
valid multibyte sequence is mbrtowc()
valid wchar_t is wcrtomb()
iswrune() is a concept outside the scope of posix
any posix standard command that produces error messages inconsistent with
mbrtowc() or wcrtomb(), e.g., via iswrune(), is non-conforming
On Wed, 5 Jun 2013 02:30:56 +0200 Cedric Blancher wrote:
> On 18 April 2013 13:38, Roland Mainz <[email protected]> wrote:
> > On Wed, Apr 17, 2013 at 2:52 PM, Cedric Blancher
> > <[email protected]> wrote:
> >> Glenn, can you take a look at the posting from freebsd-standards? AST
> >> tr -C doesn't ignore unassigned code points as it should be.
> > [snip]
> >
> > Grumpf... I think you're right...
> > ... the trouble is that not all platforms implement the |iswrune()|
> > function (see
> > http://developer.apple.com/library/ios/#documentation/system/conceptual/manpages_iphoneos/man3/iswrune.3.html)
> > ...
> >
> > ... AFAIK (based on some testing on a FreeBSD system vs. Solaris) the
> > following |iswrune() emulation code should work (and we need a iffe
> > probe for |iswrune()| and fall-back to the emulation):
> > -- snip --
> > #include <stdlib.h>
> > #include <stdio.h>
> > #include <locale.h>
> > #include <wctype.h>
> >
> > static
> > int iswrune_emu(wint_t c)
> > {
> > /*
> > * we test |iswprint()| first because it has
> > * usually the largest number of members and
> > * the fastest implementation
> > */
> > if (iswprint(c))
> > return (1);
> > if (iswalnum(c) ||
> > iswcntrl(c) ||
> > iswdigit(c) ||
> > iswgraph(c) ||
> > iswpunct(c) ||
> > iswspace(c) ||
> > iswxdigit(c) ||
> > iswblank(c) ||
> > iswlower(c) ||
> > iswupper(c))
> > return (1);
> >
> > return (0);
> > }
> >
> > int main(int ac, char *av[])
> > {
> > wint_t i;
> >
> > setlocale(LC_ALL, "");
> >
> > puts("#start.");
> >
> > for (i=0x3000 ; i < 0x4000 ; i++)
> > {
> > if (!iswrune_emu(i))
> > {
> > printf("code point %lx not assigned.\n",
> > (long)i);
> > }
> > }
> >
> > puts("#done");
> > return (EXIT_SUCCESS);
> > }
> > -- snip --
> > (note that |iswprint()| is explicitly seperated out to highlight the
> > performace optimisation)
> >
> > Erm... Glenn... what do you think ?
> >
> > ----
> >
> > Bye,
> > Roland
> >
> > P.S.: If we use the emulation then AST regex should (IMO0 still
> > support [:rune:] (through the emulation) ...
> Glenn, are you going to put this fix into AST tr for the next alpha?
> IMO filtering unassigned code points is required for a standard
> conforming tr -C implementation.
> Ced
> --
> Cedric Blancher <[email protected]>
> Institute Pasteur
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers