Re: [bug-gettext] broken handling of unicode code point escapes in Tcl

Guido Berhoerster Wed, 26 Jun 2013 02:27:48 -0700

* Daiki Ueno <[email protected]> [2013-06-26 04:22]:
> Guido Berhoerster <[email protected]> writes:
> 
> > I still wonder why you're substituting \u escapes with unicode
> > characters at all, as that potentially allows unescaped control
> > sequences which make the .po file quite fragile?
> 
> I agree that interpreting \u escapes might cause confusing output for
> Unicode control characters, but I don't think it is totally unuseful.
> 
> I can think of at least a couple of benefits of the current behavior:
> 
> 1. translators are provided with decoded (human-readable) strings
> 2. strings escaped in different escaping schemes (e.g. \U in Python) can
>    be unified
> 
> Perhaps an idea might be to introduce gettext-specific Unicode escaping
> scheme (which may only escape control characters) and add an option to
> xgettext to use it.


It can be a bit more complicated than just control characters,
e.g. certain space characters such as U+00A0, U+202F or U+2001
are also non-obvious but not control sequences. Maybe a better
option would be to offer substitution of only alphanumeric and
punctuation characters rather than non-control characters.
Or you could simply add an option to not substitute \u escapes
at all, that is the behavior of the diverse native Tcl
.msg-format extractors that float around (e.g. thos included in
in tkabber or coccinella) and what I'd personally prefer.
-- 
Guido Berhoerster

Re: [bug-gettext] broken handling of unicode code point escapes in Tcl

Reply via email to