This is an automated email from the ASF dual-hosted git repository. swebb2066 pushed a commit to branch charset_documentation in repository https://gitbox.apache.org/repos/asf/logging-log4cxx.git
commit 5a20c85d5c100bf1728d2a6dbbaf76aa0685e6e0 Author: Stephen Webb <swebb2...@gmail.com> AuthorDate: Mon Aug 14 16:51:01 2023 +1000 Update the Unicode support FAQ documentation --- src/site/markdown/faq.md | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/src/site/markdown/faq.md b/src/site/markdown/faq.md index b23ce38a..4f36b6d3 100644 --- a/src/site/markdown/faq.md +++ b/src/site/markdown/faq.md @@ -47,12 +47,26 @@ DLL" with release builds of Log4cxx and "Multithread DLL Debug" with debug build Yes. Apache Log4cxx exposes API methods in multiple string flavors supporting differently encoded textual content, like `char*`, `std::string`, `wchar_t*`, `std::wstring`, `CFStringRef` et al. All provided texts will be converted to the `LogString` type before further processing, which is one of -several supported Unicode representations selected by the `LOG4CXX_CHAR` cmake option. If methods are +several supported internal representations and is selected by the `LOG4CXX_CHAR` cmake option. If methods are used that take `LogString` as arguments, the macro `LOG4CXX_STR()` can be used to convert literals -to the current `LogString` type. FileAppenders support an encoding property as well, which should be -explicitly specified to `UTF-8` or `UTF-16` for e.g. XML files. The important point is to get the -chain of input, internal processing and output correct and that might need some additional setup in -the app using Log4cxx: +to the current `LogString` type. + +The default external representation is controlled by the `LOG4CXX_CHARSET` cmake option. +FileAppenders support an `Encoding` property allowing character set encoding control per appender. +For example, you can use `UTF-8` or `UTF-16` when writing XML or JSON layouts. +Log4cxx also implements character set encodings for `US-ASCII` (`ISO646-US` or `ANSI_X3.4-1968`) +and `ISO-8859-1` (`ISO-LATIN-1` or `CP1252`). +You are highly encouraged to stick to `UTF-8` for the best support from tools, API and operating systems. + +The `locale` character set encoding provides support beyond the above internally implemented options. +It allows you to use any multi-byte encoding provided by the standard library. +See also [some SO post](https://stackoverflow.com/questions/571359/how-do-i-set-the-proper-initial-locale-for-a-c-program-on-windows) +on setting the default locale in C++. + +``` +std::setlocale( LC_ALL, "" ); /* Set locale for C functions */ +std::locale::global(std::locale("")); /* set locale for C++ functions */ +``` According to the [libc documentation](https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html), all programs start in the `C` locale by default, which is the [same as ANSI_X3.4-1968](https://stackoverflow.com/questions/48743106/whats-ansi-x3-4-1968-encoding) @@ -72,13 +86,5 @@ loggername - ?????????? ???? ?????????????? The important thing to understand is that this is some always applied, backwards compatible default behaviour and even the case when the current environment sets a locale like `en_US.UTF-8`. One might need to explicitly tell the app at startup to use the locale of the environment and make things -compatible with Unicode this way. See also [some SO post](https://stackoverflow.com/questions/571359/how-do-i-set-the-proper-initial-locale-for-a-c-program-on-windows) -on setting the default locale in C++. - -``` -std::setlocale( LC_ALL, "" ); /* Set locale for C functions */ -std::locale::global(std::locale("")); /* set locale for C++ functions */ -``` +compatible with Unicode this way. -See [LOGCXX-483](https://issues.apache.org/jira/browse/LOGCXX-483) or [GHPR #31](https://github.com/apache/logging-log4cxx/pull/31#issuecomment-668870727) -for additional details.