On 11/01/2019 10:23, Rudolfs Mazurs wrote:
Hi,
I have a service that is using xerces-c and has to be run stared under C
locale for LANG and LC_*. I need xerces to be able to parse xml with UTF-8
characters, so I used this workaround:
setlocale(LC_CTYPE,"en_US.UTF-8");
XMLPlatformUtils::Initialize();
And while it seems to work, I noticed that Initialize constructor has a
parameter “const char *const locale”, which I assume [1] overrides any
system variables. However,
My understanding is that this locale parameter only affects the
selection of the message catalogue used for printing messages. Since
there is only a single en_US message catalogue, overriding it won't do
anything useful. So in terms of UTF-8 processing, I think this is a red
herring.
I would have hoped that Xerces-C would behave in a locale-independent
manner and work the same in all locales except maybe with respect to the
locale-defined stream encoding (which might be part of the problem).
Which transcoder have you configured Xerces-C to use? I notice that GNU
iconv does some querying of the current charset with setlocale (but
doesn't use the simpler and more correct nl_langinfo). If you're using
gnuiconv, maybe try ICU instead?
For the software I maintain, we were forced to mandate the use of UTF-8
locale for correct operation.
Regards,
Roger