Jungtaek Lim created SPARK-29160:
------------------------------------

             Summary: Event log file is written without specific charset which 
should be ideally UTF-8
                 Key: SPARK-29160
                 URL: https://issues.apache.org/jira/browse/SPARK-29160
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: Jungtaek Lim


This issue is from observation by [~vanzin] : 
[https://github.com/apache/spark/pull/25670#discussion_r325383512]

Quoting his comment here:
{noformat}
This is a long standing bug in the original code, but this should be explicitly 
setting the charset to UTF-8 (using new PrintWriter(new 
OutputStreamWriter(...)).

The reader side should too, although doing that now could potentially break old 
logs... we should open a bug for this.{noformat}
While EventLoggingListener writes to UTF-8 properly when converting to byte[] 
before writing, it doesn't deal with charset in logEvent().

It should be fixed, but as Marcelo said, we also need to be aware of potential 
broken of reading old logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to