[ 
https://issues.apache.org/jira/browse/SPARK-29160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932901#comment-16932901
 ] 

Jungtaek Lim commented on SPARK-29160:
--------------------------------------

While I just added 3.0.0 as Affected Version, all versions we support might be 
affected.

> Event log file is written without specific charset which should be ideally 
> UTF-8
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-29160
>                 URL: https://issues.apache.org/jira/browse/SPARK-29160
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Priority: Major
>
> This issue is from observation by [~vanzin] : 
> [https://github.com/apache/spark/pull/25670#discussion_r325383512]
> Quoting his comment here:
> {quote}
> This is a long standing bug in the original code, but this should be 
> explicitly setting the charset to UTF-8 (using new PrintWriter(new 
> OutputStreamWriter(...)).
> The reader side should too, although doing that now could potentially break 
> old logs... we should open a bug for this.
> {quote}
> While EventLoggingListener writes to UTF-8 properly when converting to byte[] 
> before writing, it doesn't deal with charset in logEvent().
> It should be fixed, but as Marcelo said, we also need to be aware of 
> potential broken of reading old logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to