Chiragkumar created SPARK-12557:
-----------------------------------
Summary: Spark 1.5.1 is unable to read S3 file system (Java
exception - s3a.S3AFileSystem not found)
Key: SPARK-12557
URL: https://issues.apache.org/jira/browse/SPARK-12557
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.5.1
Environment: AWS (EC2) instances + S3 + Hadoop CDH
Reporter: Chiragkumar
Hello Technical Support team,
This is one of critical production issue we are facing on Spark version 1.5.1.
It is throwing JAVA runtime exception error
"apache.hadoop.fs.s3a.S3AFileSystem" not found. Although it works perfectly on
Spark version 1.3.1. Is this known issue on Spark1.5.1? I have opened case with
Cloudera CDH but they are not fully supporting this yet. We are using
spark-shell (scala) now a day lot so end user would prefer this environment to
execute there HQL and most of datasets exist at S3 bucket. Note that there is
no complain if the dataset call from HDFS (Hadoop FS) so it seems to be related
to my Spark configuration or something similar. Pls help to identify root cause
and its solution. Following is the more technical info for review :
scala> val rdf1 = sqlContext.sql("Select * from
ntcom.nc_currency_dim").collect()
rdf1: Array[org.apache.spark.sql.Row] =
Array([-1,UNK,UNKNOWN,UNKNOWN,0.74,1.35,1.0,1.0,DBUDAL,11-JUN-2014
20:36:41,JHOSLE,2008-03-26 00:00:00.0,105.0,6.1,2014-06-11 20:36:41,2015-07-08
22:10:02,N], [-1,UNK,UNKNOWN,UNKNOWN,1.0,1.0,1.0,1.0,PDHAVA,08-JUL-2015
22:10:03,JHOSLE,2008-03-26 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01
00:00:00,Y], [1,DKK,Danish Krone,Danish
Krone,0.13,7.46,0.180965147453,5.53,DBUDAL,11-JUN-2014
20:36:41,NCBATCH,2007-01-16 00:00:00.0,19.0,1.1,2014-06-11 20:36:41,2015-07-08
22:10:02,N], [1,DKK,Danish Krone,Danish
Krone,0.134048257372654,7.46,0.134048257372654,7.46,PDHAVA,08-JUL-2015
22:10:03,NCBATCH,2007-01-16 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01
00:00:00,Y], [2,EUR,Euro,EMU currency
(Euro),1.0,1.0,1.35,0.74,DBUDAL,11-JUN-2014 20:36:41,NCBA...
rdf1 = sqlContext.sql("Select * from dev_ntcom.nc_currency_dim").collect()
11:52 AM
ava.lang.RuntimeException: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:303)
at scala.Option.map(Option.scala:145)
Caused by: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
... 120 more
15/11/05 20:31:01 ERROR log: error in initSerDe:
org.apache.hadoop.hive.serde2.SerDeException Encountered exception determining
schema. Returning signal schema to indicate problem:
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem
not found
org.apache.hadoop.hive.serde2.SerDeException: Encountered exception determining
schema. Returning signal schema to indicate problem:
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs
at
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:524)
at
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]