On 5 June 2013 07:03, Glenn Fowler <[email protected]> wrote:
>
> I had posed a question to the posix austin group related to this
> and failed to report back to ast-developers
>
> here is the relevant snippet, starting with a response from the group
> and my comment
>
>>> Maybe what you're confusing is the concept of unassigned Unicode
>>> codepoints (a Unicode concept irrelevant to C/POSIX) and invalid
>>> wchar_t values or illegal multibyte sequences (a C/POSIX concept). As
>>> far as C/POSIX is concerned, a multibyte sequence is legal if and only
>>> if it corresponds to a wchar_t value via mbrtowc, and conversely, a
>>> wchar_t value is a valid character if and only if it corresponds to a
>>> multibyte character via wcrtomb. These operations should be inverses;
>>> in particular they should be defined on each other's ranges.
>>
>> yes there is confusion started on some other threads which contained
>> references to
>>         int iswrune(wchar_t)
>> which apparently tests for assigned codepoints
>>
>> what you just pointed out it is exactly what is needed for the POSIX tr
>> implementation -- basically that unassigned codepoints do not come into play
>
> basically the only tools an application has for:
>         valid multibyte sequence is mbrtowc()
>         valid wchar_t is wcrtomb()
> iswrune() is a concept outside the scope of posix
> any posix standard command that produces error messages inconsistent with
> mbrtowc() or wcrtomb(), e.g., via iswrune(), is non-conforming

OK

How do deal with unassigned code points then? I think that FreeBSD tr
removes them from a range is valid since they are not characters. Or
doesn't that fit into POSIX?

Ced
-- 
Cedric Blancher <[email protected]>
Institute Pasteur
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to