On Mon, Aug 5, 2013 at 4:13 AM, Roland Mainz <[email protected]> wrote:
> Attached (as "astksh20130727_printf_w_gb18030_001.diff.txt") is a
> prototype patch which handles two issues related to GB18030:
>
> 1. Currently ksh93 supports "\u[codepoint]" in printf(1) to write a
> unicode character to stdout. However the current code assumes that
> |wchar_t| in the current locale represents an unicode code point...
> which isn't true for all locales (for example zn_CN.GB18030 on Solaris
> uses GB18030 codepoints for |wchar_t| and not unicode codepoints). In
> that case printf "\u[codepoint]" doesn't work.
> The patch fixes this by using |iconv()| to convert between the UTF32
> codepoint value and the locale's character set. If the requested
> character can not be represented in the current locale/encoding printf
> "\u[codepoint]" will return an empty string
>
> 2. The other issue is that some users greatly wish to use the
> codepoint values of their locale and _not_ the unicode codepoint
> value. Therefore the patch adds printf "\w[codepoint]" that a
> codepoint can be specified using the wchar_t value.
>
> Note that the patch is a _prototype_ ... if the general idea is OK
> I'll craft a better patch...
>
> * Questions (mainly for David&&Glenn):
> - Is the patch OK so far ?
> - Does libast have any code to detect whether the locale is a unicode locale ?
> - Is there any reason that printf(1)'s "%b" format does not support
> "\u[codepoint], e.g. bug or feature ? :-)
> - Somehow I can't include <iconv.h> ... it seems the AST <iconv.h>
> header is build later than src/lib/libast/string/chresc.c which causes
> the build to fail... any idea why ?

BTW: Note that I leave the output of $'...'-style string literals open
for now since this requires more thinking.
_Likely_ it will end-up with something like that use use
$'\u[codepoint]' for unicode locales and fall-back to '\w[codepoint]'
for non-unicode locales, but provide options to force one of the two
options (we can't force \u[codepoint] for non-unicode locales without
a performance hit but keep it open as _option_ since not always the
conversion to Unicode is needed (for example in a pipe)).

IMO the $'...'-issue should be deferred because right now printf
"\w[codepoint]"-support is much much more pressing for GB18030
support.

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to