On Mon, Aug 5, 2013 at 4:13 AM, Roland Mainz <[email protected]> wrote: > Attached (as "astksh20130727_printf_w_gb18030_001.diff.txt") is a > prototype patch which handles two issues related to GB18030: > > 1. Currently ksh93 supports "\u[codepoint]" in printf(1) to write a > unicode character to stdout. However the current code assumes that > |wchar_t| in the current locale represents an unicode code point... > which isn't true for all locales (for example zn_CN.GB18030 on Solaris > uses GB18030 codepoints for |wchar_t| and not unicode codepoints). In > that case printf "\u[codepoint]" doesn't work. > The patch fixes this by using |iconv()| to convert between the UTF32 > codepoint value and the locale's character set. If the requested > character can not be represented in the current locale/encoding printf > "\u[codepoint]" will return an empty string > > 2. The other issue is that some users greatly wish to use the > codepoint values of their locale and _not_ the unicode codepoint > value. Therefore the patch adds printf "\w[codepoint]" that a > codepoint can be specified using the wchar_t value. > > Note that the patch is a _prototype_ ... if the general idea is OK > I'll craft a better patch... > > * Questions (mainly for David&&Glenn): > - Is the patch OK so far ? > - Does libast have any code to detect whether the locale is a unicode locale ? > - Is there any reason that printf(1)'s "%b" format does not support > "\u[codepoint], e.g. bug or feature ? :-) > - Somehow I can't include <iconv.h> ... it seems the AST <iconv.h> > header is build later than src/lib/libast/string/chresc.c which causes > the build to fail... any idea why ?
BTW: Note that I leave the output of $'...'-style string literals open for now since this requires more thinking. _Likely_ it will end-up with something like that use use $'\u[codepoint]' for unicode locales and fall-back to '\w[codepoint]' for non-unicode locales, but provide options to force one of the two options (we can't force \u[codepoint] for non-unicode locales without a performance hit but keep it open as _option_ since not always the conversion to Unicode is needed (for example in a pipe)). IMO the $'...'-issue should be deferred because right now printf "\w[codepoint]"-support is much much more pressing for GB18030 support. ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [email protected] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
