Spark Utf 8 encoding

lsn24 Fri, 09 Nov 2018 17:17:24 -0800

Hello,

 Per the documentation default character encoding of spark is UTF-8. But
when i try to read non ascii characters, spark tend to read it as question
marks. What am I doing wrong ?. Below is my Syntax:


val ds = spark.read.textFile("a .bz2 file from hdfs");
ds.show();

The string "KøBENHAVN"  gets displayed as "K�BENHAVN"

I did the testing on spark shell, ran it the same command as a part of spark
Job. Both yields the same result.

I don't know what I am missing . I read the documentation, I couldn't find
any explicit config etc.

Any pointers will be greatly appreciated!

Thanks




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Spark Utf 8 encoding

Reply via email to