On 16 September 2013 18:39, Roland Mainz <[email protected]> wrote:
> Hi!
>
> ----
>
> While doing some GB18030 testing I found a disturbing issue:
> A lot of calls to the |*mb*()| functions are done without thinking
> about the current shift state. The issue is that this state is a
> hidden global variable and may easily be overlooked (the issue that
> UTF-8 can recover from invalid shift states makes this worse since
> UTF-8 locales won't suffer from this problem) ... which causes
> problems for Shift-State depending encodings like
> GBK/GB18030/ShiftJis.
>
> My preferred solution would be to change the current libast mb API to
> always take a |mbstate_t| argument. This would fix this issue (by
> making the shift state explicit), fix issues with nesting calls, e.g.
> if we are in a specific shift state and then call a utility function
> which operates on a different string ... and fix thread-safeness
> issues with the "hidden" global variable containing the current shift
> state...

Well, this may explain why ksh93 sometimes has lapses when it wants to
process characters which are encoded not with UTF8. bash 4 handles
this flawlessly, but only since they use mbstate_t ps;memset (&ps, 0,
sizeof (mbstate_t));wcrtomb() everywhere. Even using a single mb
function without mbstate_t can render your whole application useless.

Q: Why doesn't POSIX deprecate mb functions which do not use a
mbstate_t? The mistake ksh93 does is easy to make and so hard to
rectify.

Wendy
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to