[ https://issues.apache.org/jira/browse/LOG4J2-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659476#comment-13659476 ]
Nick Williams commented on LOG4J2-255: -------------------------------------- Okay, I think I understand all of this better. The 100% correct solution that will work all of the time is to change all of the computers in the world to have a default UTF-8 platform encoding. Too bad we don't have the power to do that... ;-) Here's what I think should be happening: Internally, absolutely everything should be handled UTF-8 for consistency's sake. However, when dealing with external resources: - Data transmitted over the wire or interprocess (such as net, flume, etc.) should use UTF-8 exclusively. - XML written to a file or other non-network output stream should use UTF-8 exclusively. - Data read from files or other non-network input streams should detect the file encoding (is this possible? do we have to just rely on the platform default here?) and read in that file encoding, converting to Unicode upon reading (which should happen automatically, since all Strings in Java are Unicode). My understanding of XML is that you SHOULD always encode it a Unicode variant such as UTF-8, UTF-16, etc., but not everybody does. - Data written to files or other output streams (including the Console) should use the platform default encoding if no explicit encoding is specified. Every AbstractStringLayout should provide a way to specify an encoding that overrides the platform default encoding. AbstractStringLayout already does this by having a mandatory constructor that takes a Charset. However, it doesn't account for the possibility that it is constructed with a null Charset. IMO, it should be setting the Charset to the platform default if it's constructed with a null Charset. Furthermore, every class that extends AbstractStringLayout should use this Charset /except/ XMLLayout, which should ALWAYS use UTF-8. The `@PluginAttr("charset") String charsetName` parameter for XMLLayout#createLayout should be removed, the `Charset charset` parameter for XMLLayout#XMLLayout should be removed, and UTF-8 should be hardcoded as the value for super(). (In fact, right now the XMLLayout is broken, because it accepts a user-supplied Charset but the header is hard-coded to <?xml version="1.0" encoding="UTF-8"?>.) (Side note: Strings in Java are Unicode, not UTF-8. Some of the people commenting here have used these terms interchangeably, but they are not interchangeable. Unicode is the system of assigning decimal numbers to characters. UTF-8, UTF-16, UTF-32, etc. are different systems for interpreting bytes as these decimal, Unicode numbers. http://stackoverflow.com/questions/643694/utf-8-vs-unicode) > Multi-byte character strings are scrambled in log output > -------------------------------------------------------- > > Key: LOG4J2-255 > URL: https://issues.apache.org/jira/browse/LOG4J2-255 > Project: Log4j 2 > Issue Type: Bug > Components: Appenders, Core > Affects Versions: 2.0-beta6 > Reporter: Remko Popma > Assignee: Remko Popma > Priority: Blocker > Fix For: 2.0-beta7 > > > When I tried to log a Japanese string the output was scrambled in both the > Console and a log file. > For example, > logger.warn("日本語テスト"); // (Japanese test) > came out as > 15:07:00.184 [main] WARN test.JapaneseTest - 譌・譛ャ隱槭ユ繧ケ繝? > This is the log4j2.xml configuration: > <?xml version="1.0" encoding="UTF-8"?> > <configuration status="warn"> > <appenders> > <Console name="Console" target="SYSTEM_OUT"> > <PatternLayout> > <pattern>%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n > </pattern> > </PatternLayout> > </Console> > <File name="tracelog" fileName="trace-log.txt" immediateFlush="true" > append="false"> > <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level > %logger{36} - %msg%n"/> > </File> > </appenders> > > <loggers> > <root level="trace"> > <appender-ref ref="Console"/> > <appender-ref ref="tracelog"/> > </root> > </loggers> > </configuration> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: log4j-dev-unsubscr...@logging.apache.org For additional commands, e-mail: log4j-dev-h...@logging.apache.org