On 11/01/2019 10:23, Rudolfs Mazurs wrote:
Hi,
I have a service that is using xerces-c and has to be run stared under C
locale for LANG and LC_*. I need xerces to be able to parse xml with UTF-8
characters, so I used this workaround:

setlocale(LC_CTYPE,"en_US.UTF-8");
XMLPlatformUtils::Initialize();

And while it seems to work, I noticed that Initialize constructor has a
parameter “const char *const locale”, which I assume [1] overrides any
system variables. However,

My understanding is that this locale parameter only affects the selection of the message catalogue used for printing messages. Since there is only a single en_US message catalogue, overriding it won't do anything useful. So in terms of UTF-8 processing, I think this is a red herring.

I would have hoped that Xerces-C would behave in a locale-independent manner and work the same in all locales except maybe with respect to the locale-defined stream encoding (which might be part of the problem).

Which transcoder have you configured Xerces-C to use? I notice that GNU iconv does some querying of the current charset with setlocale (but doesn't use the simpler and more correct nl_langinfo). If you're using gnuiconv, maybe try ICU instead?

For the software I maintain, we were forced to mandate the use of UTF-8 locale for correct operation.


Regards,
Roger

Reply via email to