Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost 
only supporting Linux, so UTF-8 is the only encoding supported, as it is the 
the one on Linux.
If you have other encoding data, you may want to vote for this 
Jira:https://issues.apache.org/jira/browse/MAPREDUCE-232
Yong

Date: Wed, 8 Apr 2015 10:35:18 -0700
Subject: Reading file with Unicode characters
From: lists.a...@gmail.com
To: user@spark.apache.org
CC: lists.a...@gmail.com

Hi,
Does SparkContext's textFile() method handle files with Unicode characters? How 
about files in UTF-8 format?
Going further, is it possible to specify encodings to the method? If not, what 
should one do if the files to be read are in some encoding?
Thanks,arun
                                          

Reply via email to