[ https://issues.apache.org/jira/browse/LOG4J2-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659101#comment-13659101 ]
Gary Gregory commented on LOG4J2-255: ------------------------------------- I think I might have figured it out; the bottom line is that you should really specify an encoding for your appenders. Neither UTF-8 nor the platform default is right, but the platform default is likely worse. 1) Let me walk us through it: You are saying that if I encode my .java source files (I will leave aside the scenario where messages are in an external file like a .properties files) in encoding X, then I should set the encoding for my appenders to match X. However, I can use Unicode escapes in a Java String to create any kind of Unicode String, no matter what encoding I am using in my .java source file. Because Java Strings are Unicode strings, it does not matter what the encoding of the source file is, the compiler and JVM use Unicode strings. If you use an encoding in your .java files that is not the platform encoding, then you have to tell the compiler about the encoding in order for the compiler to read the source correctly and convert the Java bytes to String objects. As we all know, if I have a Java String and I want bytes, I need to use an encoding to convert the String to bytes. Therefore, the encoding of the source file is irrelevant. What matters is that the JVM has a Unicode String object and we need to give it an encoding to get bytes to write some place. So, back to the Russians: If the JVM has a (Unicode) String with Cyrillic characters, I had better give it an encoding that knows what to do with these characters; ASCII will not do for example. If I rely on the platform encoding, who knows what I will get. On Windows for example, there are no Cyrillic characters in the default encoding Cp1252. If UTF-8 is the default -- recall that UTF stands for Unicode Transformation Format – I should get better results. The bottom line is that for predictable results, I should always use an encoding in my configuration and not rely on the platform encoding. If I do not specify and default to the platform encoding, sometimes I will get expected results, sometimes I will get junk and other times different kinds of junk, all depending on the platform. On the other hand, if I do not specify and default to UTF-8, I’m likely to get better results and if I do get junk, it will be the same junk on all platforms. 2) The only encodings you can count on are the six listed in http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html. In practice, different JRE and JDK implementations provide many additional encodings, but these are not required. Therefore, in order to write portable tests, we should not expect them to be there, we should only rely on the required six. See my changes to CharsetTests.java. Please help me find holes in this or support it ;) > Multi-byte character strings are scrambled in log output > -------------------------------------------------------- > > Key: LOG4J2-255 > URL: https://issues.apache.org/jira/browse/LOG4J2-255 > Project: Log4j 2 > Issue Type: Bug > Components: Appenders, Core > Affects Versions: 2.0-beta6 > Reporter: Remko Popma > Assignee: Remko Popma > Priority: Blocker > Fix For: 2.0-beta7 > > > When I tried to log a Japanese string the output was scrambled in both the > Console and a log file. > For example, > logger.warn("日本語テスト"); // (Japanese test) > came out as > 15:07:00.184 [main] WARN test.JapaneseTest - 譌・譛ャ隱槭ユ繧ケ繝? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org