[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Punit Shah updated SPARK-32965: ------------------------------- Attachment: 16le.csv > pyspark reading csv files with utf_16le encoding > ------------------------------------------------ > > Key: SPARK-32965 > URL: https://issues.apache.org/jira/browse/SPARK-32965 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.7, 3.0.0, 3.0.1 > Reporter: Punit Shah > Priority: Major > Attachments: 16le.csv > > > If you have a file encoded in utf_16le or utf_16be and try to use > spark.read.csv("<file_name>", encoding="utf_16le") the dataframe isn't > rendered properly > if you use python decoding like: > prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : > x.decode("utf_16le").splitlines()) > and then do spark.read.csv(prdd), then it works. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org