On 16 September 2013 18:39, Roland Mainz <[email protected]> wrote: > Hi! > > ---- > > While doing some GB18030 testing I found a disturbing issue: > A lot of calls to the |*mb*()| functions are done without thinking > about the current shift state. The issue is that this state is a > hidden global variable and may easily be overlooked (the issue that > UTF-8 can recover from invalid shift states makes this worse since > UTF-8 locales won't suffer from this problem) ... which causes > problems for Shift-State depending encodings like > GBK/GB18030/ShiftJis. > > My preferred solution would be to change the current libast mb API to > always take a |mbstate_t| argument. This would fix this issue (by > making the shift state explicit), fix issues with nesting calls, e.g. > if we are in a specific shift state and then call a utility function > which operates on a different string ... and fix thread-safeness > issues with the "hidden" global variable containing the current shift > state...
Well, this may explain why ksh93 sometimes has lapses when it wants to process characters which are encoded not with UTF8. bash 4 handles this flawlessly, but only since they use mbstate_t ps;memset (&ps, 0, sizeof (mbstate_t));wcrtomb() everywhere. Even using a single mb function without mbstate_t can render your whole application useless. Q: Why doesn't POSIX deprecate mb functions which do not use a mbstate_t? The mistake ksh93 does is easy to make and so hard to rectify. Wendy _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
