[
https://issues.apache.org/jira/browse/SPARK-29160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932901#comment-16932901
]
Jungtaek Lim commented on SPARK-29160:
--------------------------------------
While I just added 3.0.0 as Affected Version, all versions we support might be
affected.
> Event log file is written without specific charset which should be ideally
> UTF-8
> --------------------------------------------------------------------------------
>
> Key: SPARK-29160
> URL: https://issues.apache.org/jira/browse/SPARK-29160
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.0.0
> Reporter: Jungtaek Lim
> Priority: Major
>
> This issue is from observation by [~vanzin] :
> [https://github.com/apache/spark/pull/25670#discussion_r325383512]
> Quoting his comment here:
> {quote}
> This is a long standing bug in the original code, but this should be
> explicitly setting the charset to UTF-8 (using new PrintWriter(new
> OutputStreamWriter(...)).
> The reader side should too, although doing that now could potentially break
> old logs... we should open a bug for this.
> {quote}
> While EventLoggingListener writes to UTF-8 properly when converting to byte[]
> before writing, it doesn't deal with charset in logEvent().
> It should be fixed, but as Marcelo said, we also need to be aware of
> potential broken of reading old logs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]