This is an automated email from the ASF dual-hosted git repository. tschoening pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/logging-log4cxx.git
commit c65490f37e0f9ae830534770823b620988ea368b Author: Thorsten Schöning <[email protected]> AuthorDate: Thu Aug 6 10:36:09 2020 +0200 Improved the FAQ regarding support of Unicode with additional details from GHPR #31. --- src/changes/changes.xml | 3 ++- src/site/markdown/faq.md | 49 ++++++++++++++++++++++++++++-------------------- 2 files changed, 31 insertions(+), 21 deletions(-) diff --git a/src/changes/changes.xml b/src/changes/changes.xml index 4e7166d..9334c50 100644 --- a/src/changes/changes.xml +++ b/src/changes/changes.xml @@ -24,7 +24,7 @@ <body> <release version="0.11.0" - date="2020-02-09" + date="XXXX-XX-XX" description="Maintenance release."> <action issue="LOGCXX-506" type="fix">CachedDateFormat reuses timestamps without updating milliseconds after formatting timestamp with ms == 654</action> <action issue="LOGCXX-503" type="update">Checksums/Signatures don't match for log4cxx binaries</action> @@ -34,6 +34,7 @@ <action issue="LOGCXX-493" type="fix">Wrong usage of milli- vs. micro- and non- vs. milliseconds in some docs.</action> <action issue="LOGCXX-488" type="fix">Space after log level hides messages</action> <action issue="LOGCXX-484" type="fix">Spelling error s/excute/execute</action> + <action issue="LOGCXX-483" type="update">Not able to see hebrew values when logging in log4cxx</action> <action issue="LOGCXX-482" type="fix">Build failure with GCC-6</action> <action issue="LOGCXX-464" type="fix">TimeBasedRollingPolicy should append as configured on rollover</action> <action issue="LOGCXX-446" type="fix">make install fails, trying to overwrite header files</action> diff --git a/src/site/markdown/faq.md b/src/site/markdown/faq.md index fadfc9b..190ff4d 100644 --- a/src/site/markdown/faq.md +++ b/src/site/markdown/faq.md @@ -45,33 +45,42 @@ caller are using different C RTL's, the program will likely crash at the point. DLL" with release builds of log4cxx and "Multithread DLL Debug" with debug builds. ## <a name="unicode_supported"></a>Does Apache log4cxx support Unicode? -### Multiple string flavors -Yes. Apache log4cxx exposes API methods in multiple string flavors `const char*`, `std::string`, -`wchar_t*`, `std::wstring`, `CFStringRef` et al. `const char*` and `std::string` are interpreted -according to the current locale settings. Applications should call `setlocale(LC_ALL, "")` on -startup or the C RTL will assume `US-ASCII`. Before being processed internally, all these are -converted to the `LogString` type which is one of several supported Unicode representations selected -by the `--with-logchar` option. When using methods that take `LogString` as arguments, the macro -`LOG4CXX_STR()` can be used to convert ASCII literals to the current `LogString` type. FileAppenders -support an encoding property which should be explicitly specified to `UTF-8` or `UTF-16` for XML -files. - -### Example of wrong non-English logging - -For example, here is some Hebrew text which says "People with disabilities": +Yes. Apache log4cxx exposes API methods in multiple string flavors supporting differently encoded +textual content, like `char*`, `std::string`, `wchar_t*`, `std::wstring`, `CFStringRef` et al. All +provided texts will be converted to the `LogString` type before further processing, which is one of +several supported Unicode representations selected by the `--with-logchar` option. If methods are +used that take `LogString` as arguments, the macro `LOG4CXX_STR()` can be used to convert literals +to the current `LogString` type. FileAppenders support an encoding property as well, which should be +explicitly specified to `UTF-8` or `UTF-16` for e.g. XML files. The important point is to get the +chain of input, internal processing and output correct and that might need some additional setup in +the app using log4cxx: + +According to the [libc documentation](https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html), +all programs start in the `C` locale by default, which is the [same as ANSI_X3.4-1968](https://stackoverflow.com/questions/48743106/whats-ansi-x3-4-1968-encoding) +and what's commonly known as the encoding `US-ASCII`. That encoding supports a very limited set of +characters only, so inputting Unicode with that encoding in effect to output characters can't work +properly. For example, here is some Hebrew text which says "People with disabilities": נשים עם מוגבלות -If you are to log this information on a system with a locale of `en_US.UTF-8`, the log message might -look something like the following, because the given characters can't be converted to `US-ASCII`: +If you are to log this information, output on some console might be like the following, simply +because the app uses `US-ASCII` by default and that can't map those characters: ``` loggername - ?????????? ???? ?????????????? ``` -Executing `std::setlocale(LC_ALL, "")` either before actually logging the text above or at the app- -startup will allow the message to be logged appropriately. See issue [LOG4CXX-483][1] for more -information. +The important thing to understand is that this is some always applied, backwards compatible default +behaviour and even the case when the current environment sets a locale like `en_US.UTF-8`. One might +need to explicitly tell the app at startup to use the locale of the environment and make things +compatible with Unicode this way. See also [some SO post](https://stackoverflow.com/questions/571359/how-do-i-set-the-proper-initial-locale-for-a-c-program-on-windows) +on setting the default locale in C++. + +``` +std::setlocale( LC_ALL, "" ); /* Set locale for C functions */ +std::locale::global(std::locale("")); /* set locale for C++ functions */ +``` -[1]:https://issues.apache.org/jira/browse/LOGCXX-483 +See [LOGCXX-483](https://issues.apache.org/jira/browse/LOGCXX-483) or [GHPR #31](https://github.com/apache/logging-log4cxx/pull/31#issuecomment-668870727) +for additional details.
