[ 
https://issues.apache.org/jira/browse/LOG4J2-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659476#comment-13659476
 ] 

Nick Williams commented on LOG4J2-255:
--------------------------------------

Okay, I think I understand all of this better. The 100% correct solution that 
will work all of the time is to change all of the computers in the world to 
have a default UTF-8 platform encoding. Too bad we don't have the power to do 
that... ;-)

Here's what I think should be happening:

Internally, absolutely everything should be handled UTF-8 for consistency's 
sake. However, when dealing with external resources:

- Data transmitted over the wire or interprocess (such as net, flume, etc.) 
should use UTF-8 exclusively.

- XML written to a file or other non-network output stream should use UTF-8 
exclusively.

- Data read from files or other non-network input streams should detect the 
file encoding (is this possible? do we have to just rely on the platform 
default here?) and read in that file encoding, converting to Unicode upon 
reading (which should happen automatically, since all Strings in Java are 
Unicode). My understanding of XML is that you SHOULD always encode it a Unicode 
variant such as UTF-8, UTF-16, etc., but not everybody does.

- Data written to files or other output streams (including the Console) should 
use the platform default encoding if no explicit encoding is specified. Every 
AbstractStringLayout should provide a way to specify an encoding that overrides 
the platform default encoding. AbstractStringLayout already does this by having 
a mandatory constructor that takes a Charset. However, it doesn't account for 
the possibility that it is constructed with a null Charset. IMO, it should be 
setting the Charset to the platform default if it's constructed with a null 
Charset. Furthermore, every class that extends AbstractStringLayout should use 
this Charset /except/ XMLLayout, which should ALWAYS use UTF-8. The 
`@PluginAttr("charset") String charsetName` parameter for 
XMLLayout#createLayout should be removed, the `Charset charset` parameter for 
XMLLayout#XMLLayout should be removed, and UTF-8 should be hardcoded as the 
value for super(). (In fact, right now the XMLLayout is broken, because it 
accepts a user-supplied Charset but the header is hard-coded to <?xml 
version="1.0" encoding="UTF-8"?>.)

(Side note: Strings in Java are Unicode, not UTF-8. Some of the people 
commenting here have used these terms interchangeably, but they are not 
interchangeable. Unicode is the system of assigning decimal numbers to 
characters. UTF-8, UTF-16, UTF-32, etc. are different systems for interpreting 
bytes as these decimal, Unicode numbers. 
http://stackoverflow.com/questions/643694/utf-8-vs-unicode)
                
> Multi-byte character strings are scrambled in log output
> --------------------------------------------------------
>
>                 Key: LOG4J2-255
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-255
>             Project: Log4j 2
>          Issue Type: Bug
>          Components: Appenders, Core
>    Affects Versions: 2.0-beta6
>            Reporter: Remko Popma
>            Assignee: Remko Popma
>            Priority: Blocker
>             Fix For: 2.0-beta7
>
>
> When I tried to log a Japanese string the output was scrambled in both the 
> Console and a log file.
> For example,
> logger.warn("日本語テスト"); // (Japanese test)
> came out as
> 15:07:00.184 [main] WARN  test.JapaneseTest - 譌・譛ャ隱槭ユ繧ケ繝?
> This is the log4j2.xml configuration:
> <?xml version="1.0" encoding="UTF-8"?>
> <configuration status="warn">
>     <appenders>
>         <Console name="Console" target="SYSTEM_OUT">
>             <PatternLayout>
>                 <pattern>%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n
>                 </pattern>
>             </PatternLayout>
>         </Console>
>         <File name="tracelog" fileName="trace-log.txt" immediateFlush="true" 
> append="false">
>             <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level 
> %logger{36} - %msg%n"/>
>         </File>
>     </appenders>
>     
>     <loggers>
>         <root level="trace">
>             <appender-ref ref="Console"/>
>             <appender-ref ref="tracelog"/>
>         </root>
>     </loggers>
> </configuration>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org
For additional commands, e-mail: log4j-dev-h...@logging.apache.org

Reply via email to