On 5 June 2013 07:03, Glenn Fowler <[email protected]> wrote: > > I had posed a question to the posix austin group related to this > and failed to report back to ast-developers > > here is the relevant snippet, starting with a response from the group > and my comment > >>> Maybe what you're confusing is the concept of unassigned Unicode >>> codepoints (a Unicode concept irrelevant to C/POSIX) and invalid >>> wchar_t values or illegal multibyte sequences (a C/POSIX concept). As >>> far as C/POSIX is concerned, a multibyte sequence is legal if and only >>> if it corresponds to a wchar_t value via mbrtowc, and conversely, a >>> wchar_t value is a valid character if and only if it corresponds to a >>> multibyte character via wcrtomb. These operations should be inverses; >>> in particular they should be defined on each other's ranges. >> >> yes there is confusion started on some other threads which contained >> references to >> int iswrune(wchar_t) >> which apparently tests for assigned codepoints >> >> what you just pointed out it is exactly what is needed for the POSIX tr >> implementation -- basically that unassigned codepoints do not come into play > > basically the only tools an application has for: > valid multibyte sequence is mbrtowc() > valid wchar_t is wcrtomb() > iswrune() is a concept outside the scope of posix > any posix standard command that produces error messages inconsistent with > mbrtowc() or wcrtomb(), e.g., via iswrune(), is non-conforming
OK How do deal with unassigned code points then? I think that FreeBSD tr removes them from a range is valid since they are not characters. Or doesn't that fit into POSIX? Ced -- Cedric Blancher <[email protected]> Institute Pasteur _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
