Hello,

I have mentioned a few times that I've been working on a library which 
implements POSIX functions for native Windows. A few days ago I got it to the 
point when I felt confident enough to publish it in its current state:

https://github.com/maiddaisuki/posix32

Here's a summary of what the library currently implements (mostly 
locale-related stuff):

- locale.h functions
- langinfo.h (nl_langinfo function)
- ctype.h and wctype.h functions
- string.h functions and their wchar.h equivalents
- strings.h functions (str[n]casecmp) and their wchar.h equivalents
- uchar.h functions (including c8rtomb/mbrtoc8, with both MSCVRT and UCRT)
- stdlib.h/wchar.h C89/C95 conversion functions

Notably, it still does not implement:

- stdlib.h string-to-number conversion functions (which are locale-dependent)
- time.h functions (strftime/wcsftime, tzset)
- stdio.h functions and their wchar.h equivalents

1. locale.h

POSIX newlocale, uselocale and getlocalename_l are implemented for all CRTs. 
`newlocale` can create locale_t object with UTF-8 even with CRTs which do not 
support UTF-8, and you can use locale-specific functions such as `mbrtoc32_l` 
(extension, not POSIX) with them.

Library uses its own parser to parse strings passed to `setlocale` and 
`newlocale`. It is capable to parse any string that CRT's setlocale would 
accept, including Windows locale names and string using 
"ll[_CC][.CHARSET][@MODIFIER]" described by POSIX. For example, you can pass 
"ca_ES@valencia" or "sr_RS@latin", or even nonsense like 
"ja_US@cyrillic"/"ja-Latn-US" which will be parsed and resolved to "ja-JP".

".CHARSET" may also specify character set names such as "ISO-8859-1" instead of 
code page numbers, so that "en_US.ISO-8859-1" or "ja_JP.EUC-JP" will do exactly 
what you expect it to do.

`setlocale (..., "")` and `newlocale (..., "", ...)` use LC_* and LANG 
environment variables if they are set, falling back to user's default locale if 
not.

2. string.h

The library provides a few replacements:

- strchr
- strrchr
- strstr
- strpbrk
- strspn
- strcspn
- strtok

These replacements operate correctly on multibyte string. Library also 
implements its own `strtok_r` and `strndup`.

3. stdlib.h/wchar.h/uchar.h conversion functions

Library provides replacements for all conversion functions, as well as 
locale-specific versions for use with locale_t. It also implements POSIX 
`wcsnrtombs` and `mbsnrtowcs`, and C23 `mbrtoc8` and `c8rtomb` functions.

One notable difference from CRT's wc*tomb* functions is that they do not allow 
best-fit conversions, which is dangerous when it comes to things such as 
filenames.

Library's uchar.h functions use active locale, unlike CRT's, which always use 
UTF-8 for conversion.

- Kirill Makurin

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to