Hello, Per the documentation default character encoding of spark is UTF-8. But when i try to read non ascii characters, spark tend to read it as question marks. What am I doing wrong ?. Below is my Syntax:
val ds = spark.read.textFile("a .bz2 file from hdfs"); ds.show(); The string "KøBENHAVN" gets displayed as "K�BENHAVN" I did the testing on spark shell, ran it the same command as a part of spark Job. Both yields the same result. I don't know what I am missing . I read the documentation, I couldn't find any explicit config etc. Any pointers will be greatly appreciated! Thanks -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org