在 2025-6-26 16:15, Kirill Makurin 写道:
Hello,

I was investigating wc*tomb* and mb*towc* functions in CRT and comparing their 
behavior to other implementations.

Take the following example:

```
mbrtowc (NULL, s, 1, ps)
mbrtowc (NULL, s + 1, 1, ps)
```

Here, `s` is a pointer to multibyte (DBCS) character, but since n==1 mbrtowc 
returns (size_t)-2 and updates ps. Next call completes converting multibyte 
character. What's the return value? CRT returns 2 while glibc returns 1.

It seems to me that ISO C and POSIX specify different behavior for this case.


Please notice this line in POSIX-2024 [1]:

   [CX] [Option Start] The functionality described on this reference page is 
aligned
   with the ISO C standard. Any conflict between the requirements described 
here and
   the ISO C standard is unintentional. This volume of POSIX.1-2024 defers to 
the ISO
   C standard. [Option End]

The only difference between ISO C, of the specification about the return value, is that ISO C says 'multibyte character' while POSIX says 'character'.


   between 1 and n inclusive
   if the next n or fewer bytes complete a valid multibyte character (which is 
the
   value stored); the value returned is the number of bytes that complete the
   multibyte character.

This reads to me like 'the number of bytes' is the number of bytes within 'the next n or fewer bytes'. The function will not return a value that (after being cast to `ptrdiff_t`) is greater than n.


[1] 
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/mbrtowc.html




--
Best regards,
LIU Hao

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to