I think going back to mbrtowc_cp is a fair tradeoff performance and simplicity.

Speaking of thread safety, we already call two functions, ___lc_codepage_func 
or ___mb_cur_max_func, to obtain information about current locale and we 
already vulnerable to a call to setlocale() in between. I would preserve usage 
of isleadbyte at this point to reduce number of changes.
________________________________
From: Pali Rohár <pali.ro...@gmail.com>
Sent: Thursday, September 25, 2025 12:53 AM
To: Kirill Makurin <maiddais...@outlook.com>
Cc: mingw-w64-public <mingw-w64-public@lists.sourceforge.net>
Subject: Re: Correctly handle NULL arguments to mbrtowc

There are more mingw-w64 header files which are already including
windows.h. See: git grep 'windows.h' 'mingw-w64-crt/**.h'. I'm just
worried that in future new headers with windows.h can be added or those
usage can be extended. That is why I think it is safer to define
WIN32_LEAN_AND_MEAN before inclusion of all header files.

Patch 4: I would rather not change meaning of the msvcrt's and UCRT's
mbstate_t internal structure. It really looks like that it can bring
incompatibilities. mbstate_t is another example of structure used
internally by msvcrt / UCRT which has different layout in different CRT
libraries.

About mbrtowc_cp: For sure it is not hard, just reverting all those
changes (or reverting them partially) would achieve it easily.
Here I think is rather important to define a goal, otherwise we can end
in a loop of apply-revert-modify-apply-revert-...

If the goal is performance then we can provide different implementation
for crtdll builds of those functions. For regular msvcrt and UCRT builds
there should be a performance issue with current ___lc_codepage_func
usage (I guess it is just a function which returns value from
thread-local-storage).

If the goal is thread safety here, then we should properly look what can
cause issues and whether it is possible to fix them. Because right now
I'm not sure if it would be possible to fix all of them for that crtdll
build and also having just one implementation for all builds.

Or the goal can be one unified and simple implementation at the cost of
possible performance issues in some builds (e.g. crtdll).

On Wednesday 24 September 2025 13:23:53 Kirill Makurin wrote:
> I don't think that any CRT header file would include windows.h, so I think it 
> shouldn't be an issue.
>
> I agree that 4th patch may be a little bit of overkill and results in 
> incompatible usage of mbstate_t. This patch can be safely skipped if anyone 
> thinks it's a bad idea.
>
> Returning mbrtowc_cp is not hard. I think we may need to use IsDBCSLeadByte 
> instead of isleadbyte once again. If we would call isleadbyte from 
> mbrtowc_cp, there exists a little chance that some other thread may change 
> global locale after we called ___lc_codepage_func or ___mb_cur_max_func but 
> before isleadbyte, leading to incorrect results.
> ________________________________
> From: Pali Rohár <pali.ro...@gmail.com>
> Sent: Wednesday, September 24, 2025 10:10 PM
> To: Kirill Makurin <maiddais...@outlook.com>
> Cc: mingw-w64-public <mingw-w64-public@lists.sourceforge.net>
> Subject: Re: Correctly handle NULL arguments to mbrtowc
>
> Hello, I have few comments:
>
> PATCH 3:
> I would prefer to put '#define WIN32_LEAN_AND_MEAN' at beginning of the
> file before any other #include. This should prevent issues in future if
> some other header file is included before that define and would
> transitively include also windows.h.
>
> PATCH 4:
> About storing "unsigned short cp" into mbstate_t. This seems to be
> incompatible change with msvcrt / UCRT. And I'm not sure if this is a
> good idea. For example it can break mixing of mingw-w64 mbrtowc function
> with any other msvcrt / UCRT function which is using mbstate_t.
>
> mbrtowc_cp:
> My original idea was to simplify code, remove unnecessary parts and do
> not statically link into every executable unused code.
> But I understand that for crtdll.dll the ___lc_codepage_func
> implementation can be slow. I did not think about it originally.
> If this is really a problem then we can revert it back, but ideally as
> internal symbol prefixed e.g. by mingw term.
>
> On Wednesday 24 September 2025 12:24:43 Kirill Makurin wrote:
> > Current implementation of mbrtowc incorrectly handles situations when its 
> > first or second argument is NULL. See POSIX specification[1].
> >
> > When first argument is NULL it must behave as usual, except that it 
> > produces no output. Currently, it does not update conversion state in this 
> > case.
> >
> > When second argument is NULL, it must act as if was called as mbrtowc 
> > (NULL, "", 1, state). Currently it simply puts `state` to initial 
> > conversion state.
> >
> > Fourth patch in this series makes detection of invalid conversion state 
> > (optional POSIX feature) more robust. This may be a little overkill, but I 
> > see no harm in doing so.
> >
> > Pali, remember there previously was mbrtowc_cp? I think we could return it. 
> > crtdll.dll's ___lc_codepage_func seems to do the trick with strchr 
> > (setlocale (LC_CTYPE, NULL), '.') and atoi to convert code page to int. I 
> > find it unnecessary expensive to do on each call to mbrtowc from mbsrtowcs.
> >
> > - Kirill Makurin
> >
> > [1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbrtowc.htm

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to