RE: Reading file with Unicode characters

java8964 Wed, 08 Apr 2015 10:53:55 -0700

Spark use the Hadoop TextInputFormat to read the file. Since Hadoop is almost 
only supporting Linux, so UTF-8 is the only encoding supported, as it is the 
the one on Linux.
If you have other encoding data, you may want to vote for this 
Jira:https://issues.apache.org/jira/browse/MAPREDUCE-232
Yong

Date: Wed, 8 Apr 2015 10:35:18 -0700
Subject: Reading file with Unicode characters
From: lists.a...@gmail.com
To: user@spark.apache.org
CC: lists.a...@gmail.com

Hi,
Does SparkContext's textFile() method handle files with Unicode characters? How 
about files in UTF-8 format?
Going further, is it possible to specify encodings to the method? If not, what 
should one do if the files to be read are in some encoding?
Thanks,arun

RE: Reading file with Unicode characters

Reply via email to