RE: Reading file with Unicode characters

2015-04-08 Thread java8964
Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost only supporting Linux, so UTF-8 is the only encoding supported, as it is the the one on Linux. If you have other encoding data, you may want to vote for this Jira:https://issues.apache.org/jira/browse/MAPREDUCE-232

Re: Reading file with Unicode characters

2015-04-08 Thread Arun Lists
Thanks! arun On Wed, Apr 8, 2015 at 10:51 AM, java8964 java8...@hotmail.com wrote: Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost only supporting Linux, so UTF-8 is the only encoding supported, as it is the the one on Linux. If you have other encoding data,