Jungtaek Lim created SPARK-29160:
------------------------------------
Summary: Event log file is written without specific charset which
should be ideally UTF-8
Key: SPARK-29160
URL: https://issues.apache.org/jira/browse/SPARK-29160
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.0.0
Reporter: Jungtaek Lim
This issue is from observation by [~vanzin] :
[https://github.com/apache/spark/pull/25670#discussion_r325383512]
Quoting his comment here:
{noformat}
This is a long standing bug in the original code, but this should be explicitly
setting the charset to UTF-8 (using new PrintWriter(new
OutputStreamWriter(...)).
The reader side should too, although doing that now could potentially break old
logs... we should open a bug for this.{noformat}
While EventLoggingListener writes to UTF-8 properly when converting to byte[]
before writing, it doesn't deal with charset in logEvent().
It should be fixed, but as Marcelo said, we also need to be aware of potential
broken of reading old logs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]