* Daiki Ueno <[email protected]> [2013-06-26 04:22]: > Guido Berhoerster <[email protected]> writes: > > > I still wonder why you're substituting \u escapes with unicode > > characters at all, as that potentially allows unescaped control > > sequences which make the .po file quite fragile? > > I agree that interpreting \u escapes might cause confusing output for > Unicode control characters, but I don't think it is totally unuseful. > > I can think of at least a couple of benefits of the current behavior: > > 1. translators are provided with decoded (human-readable) strings > 2. strings escaped in different escaping schemes (e.g. \U in Python) can > be unified > > Perhaps an idea might be to introduce gettext-specific Unicode escaping > scheme (which may only escape control characters) and add an option to > xgettext to use it.
It can be a bit more complicated than just control characters, e.g. certain space characters such as U+00A0, U+202F or U+2001 are also non-obvious but not control sequences. Maybe a better option would be to offer substitution of only alphanumeric and punctuation characters rather than non-control characters. Or you could simply add an option to not substitute \u escapes at all, that is the behavior of the diverse native Tcl .msg-format extractors that float around (e.g. thos included in in tkabber or coccinella) and what I'd personally prefer. -- Guido Berhoerster
