logback / LOGBACK-1642 [Open]
LayoutWrappingEncoder does not use the correct default charset for the console

==============================

Here's what changed in this issue in the last few minutes.
This issue has been created
This issue is now assigned to you.

View or comment on issue using this link
https://jira.qos.ch/browse/LOGBACK-1642

==============================
 Issue created
------------------------------

Garret Wilson created this issue on 29/May/22 9:59 PM
Summary:              LayoutWrappingEncoder does not use the correct default 
charset for the console
Issue Type:           Bug
Affects Versions:     1.2.11
Assignee:             Logback dev list
Components:           logback-core
Created:              29/May/22 9:59 PM
Environment:          Windows 10; Java 17; locale {{en-US}}
Priority:             Major
Reporter:             Garret Wilson
Description:
  Let's say I have this typical Logback configuration to write output to 
{{stderr}}, with pretty colors and such:
  
  {code:xml}
  <configuration>
    <property scope="context" name="COLORIZER_COLORS" 
value="boldred@,boldyellow@,boldcyan@,@,@" />
    <conversionRule conversionWord="colorize" 
converterClass="org.tuxdude.logback.extensions.LogColorizer" />
    <statusListener class="ch.qos.logback.core.status.NopStatusListener" />
    <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
      <target>System.err</target>
      <withJansi>true</withJansi>
      <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
        <pattern>[%colorize(%level)] %msg%n</pattern>
      </encoder>
    </appender>
    <root level="INFO">
      <appender-ref ref="STDERR" />
    </root>
  </configuration>
  {code}
  
  The key part is that I have a {{PatternLayoutEncoder}} (a descendant of 
{{LayoutWrappingEncoder}}) logging via a {{ConsoleAppender}} to {{System.err}}.
  
  The default charset for a {{LayoutWrappingEncoder}} ([discussed in depth on 
Stack Overflow|https://stackoverflow.com/q/32207432]) is 
{{Charset.defaultCharset()}}. (How it gets that [is 
complicated|https://stackoverflow.com/a/12659462], but ultimately it relies on 
{{String.getBytes()}}.) There's just one big problem: the default charset of 
{{System.out}} and {{System.err}} is {{System.console().charset()}}, not 
{{Charset.defaultCharset()}}, as per the API documentation for e.g. 
{{System.out}}:
  
  {quote}
  The "standard" output stream. This stream is already open and ready to accept 
output data. Typically this stream corresponds to display output or another 
output destination specified by the host environment or user. The encoding used 
in the conversion from characters to bytes is equivalent to 
{{Console.charset()}} if the {{Console}} exists, {{Charset.defaultCharset()}} 
otherwise.
  {quote}
  
  On my system for example, {{Charset.defaultCharset()}} is set to 
{{windows-1252}}, while {{System.console().charset()}} returns {{IBM437}}. This 
results in mojibake: if I try to log the string {{"é"}} via Logback, it appears 
in {{System.out}} or {{System.err}} as {{Θ}} instead! (See [discussion on Stack 
Overflow|https://stackoverflow.com/q/72419122].)
  
  
  Thus {{LayoutWrappingEncoder}} somehow needs to default to 
{{System.console().charset()}} (instead of {{Charset.defaultCharset()}} as it 
does now) if it is appending to {{System.out}} or {{System.err}}. (I can't 
manually specify a charset because I certainly don't know what the console 
default charset will be on each user's machine, as there will be many different 
values for different users.)
  
  Unfortunately {{LayoutWrappingEncoder}} probably has no idea where it's 
writing to and probably shouldn't care. So instead, {{LayoutWrappingEncoder}} 
should be able to ask the enclosing {{OutputStreamAppender}} for the current 
charset. {{OutputStreamAppender}} could then default to 
{{Charset.defaultCharset()}} if not specified, and {{ConsoleAppender}} could 
override the default to return {{System.console().charset()}} instead of 
{{Charset.defaultCharset()}}. Problem solved, with the added benefit that the 
default charset now comes explicitly from the {{OutputStreamAppender}} 
implementation rather than indirectly form {{String.getBytes()}} hidden in the 
bowels of {{LayoutWrappingEncoder}}.


==============================
 This message was sent by Atlassian Jira (v8.8.0#808000-sha1:e2c7e59)

_______________________________________________
logback-dev mailing list
logback-dev@qos.ch
http://mailman.qos.ch/mailman/listinfo/logback-dev

Reply via email to