[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364866#comment-16364866 ]
Bruce Robbins commented on SPARK-23410: --------------------------------------- [~maxgekk] My simple test input of [{"field1": 10, "field2": "hello"},{"field1": 12, "field2": "byte"}] is encoded like this (according to emacs hexl-mode): {noformat} 00000000: feff 005b 007b 0022 0066 0069 0065 006c ...[.{.".f.i.e.l 00000010: 0064 0031 0022 003a 0020 0031 0030 002c .d.1.".:. .1.0., 00000020: 0020 0022 0066 0069 0065 006c 0064 0032 . .".f.i.e.l.d.2 00000030: 0022 003a 0020 0022 0068 0065 006c 006c .".:. .".h.e.l.l 00000040: 006f 0022 007d 002c 007b 0022 0066 0069 .o.".}.,.{.".f.i 00000050: 0065 006c 0064 0031 0022 003a 0020 0031 .e.l.d.1.".:. .1 00000060: 0032 002c 0020 0022 0066 0069 0065 006c .2.,. .".f.i.e.l 00000070: 0064 0032 0022 003a 0020 0022 0062 0079 .d.2.".:. .".b.y 00000080: 0074 0065 0022 007d 005d 000a .t.e.".}.].. {noformat} I just used iconv to convert the file from utf-8 to utf-16. > Unable to read jsons in charset different from UTF-8 > ---------------------------------------------------- > > Key: SPARK-23410 > URL: https://issues.apache.org/jira/browse/SPARK-23410 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.3.0 > Reporter: Maxim Gekk > Priority: Major > > Currently the Json Parser is forced to read json files in UTF-8. Such > behavior breaks backward compatibility with Spark 2.2.1 and previous versions > that can read json files in UTF-16, UTF-32 and other encodings due to using > of the auto detection mechanism of the jackson library. Need to give back to > users possibility to read json files in specified charset and/or detect > charset automatically as it was before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org