[jira] [Commented] (LOG4J2-255) Multi-byte character strings are scrambled in log output

Gary Gregory (JIRA) Wed, 15 May 2013 18:02:32 -0700

    [ 
https://issues.apache.org/jira/browse/LOG4J2-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659101#comment-13659101
 ]


Gary Gregory commented on LOG4J2-255:
-------------------------------------

I think I might have figured it out; the bottom line is that you should really 
specify an encoding for your appenders. Neither UTF-8 nor the platform default 
is right, but the platform default is likely worse.

1) Let me walk us through it: You are saying that if I encode my .java source 
files (I will leave aside the scenario where messages are in an external file 
like a .properties files) in encoding X, then I should set the encoding for my 
appenders to match X. However, I can use Unicode escapes in a Java String to 
create any kind of Unicode String, no matter what encoding I am using in my 
.java source file. Because Java Strings are Unicode strings, it does not matter 
what the encoding of the source file is, the compiler and JVM use Unicode 
strings. If you use an encoding in your .java files that is not the platform 
encoding, then you have to tell the compiler about the encoding in order for 
the compiler to read the source correctly and convert the Java bytes to String 
objects. As we all know, if I have a Java String and I want bytes, I need to 
use an encoding to convert the String to bytes. Therefore, the encoding of the 
source file is irrelevant. What matters is that the JVM has a Unicode String 
object and we need to give it an encoding to get bytes to write some place.

So, back to the Russians: If the JVM has a (Unicode) String with Cyrillic 
characters, I had better give it an encoding that knows what to do with these 
characters; ASCII will not do for example. If I rely on the platform encoding, 
who knows what I will get. On Windows for example, there are no Cyrillic 
characters in the default encoding Cp1252. If UTF-8 is the default -- recall 
that UTF stands for Unicode Transformation Format – I should get better 
results. 

The bottom line is that for predictable results, I should always use an 
encoding in my configuration and not rely on the platform encoding. If I do not 
specify and default to the platform encoding, sometimes I will get expected 
results, sometimes I will get junk and other times different kinds of junk, all 
depending on the platform. On the other hand, if I do not specify and default 
to UTF-8, I’m likely to get better results and if I do get junk, it will be the 
same junk on all platforms.

2) The only encodings you can count on are the six listed in 
http://docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html. In 
practice, different JRE and JDK implementations provide many additional 
encodings, but these are not required. Therefore, in order to write portable 
tests, we should not expect them to be there, we should only rely on the 
required six. See my changes to CharsetTests.java.
Please help me find holes in this or support it ;)

                
> Multi-byte character strings are scrambled in log output
> --------------------------------------------------------
>
>                 Key: LOG4J2-255
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-255
>             Project: Log4j 2
>          Issue Type: Bug
>          Components: Appenders, Core
>    Affects Versions: 2.0-beta6
>            Reporter: Remko Popma
>            Assignee: Remko Popma
>            Priority: Blocker
>             Fix For: 2.0-beta7
>
>
> When I tried to log a Japanese string the output was scrambled in both the 
> Console and a log file.
> For example,
> logger.warn("日本語テスト"); // (Japanese test)
> came out as
> 15:07:00.184 [main] WARN  test.JapaneseTest - 譌･譛ｬ隱槭ユ繧ｹ繝?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org
For additional commands, e-mail: log4j-dev-h...@logging.apache.org

[jira] [Commented] (LOG4J2-255) Multi-byte character strings are scrambled in log output

Reply via email to