Chiragkumar created SPARK-12557:
-----------------------------------

             Summary: Spark 1.5.1 is unable to read S3 file system (Java 
exception - s3a.S3AFileSystem not found) 
                 Key: SPARK-12557
                 URL: https://issues.apache.org/jira/browse/SPARK-12557
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.5.1
         Environment: AWS (EC2) instances + S3 + Hadoop CDH 
            Reporter: Chiragkumar


Hello Technical Support team, 

This is one of critical production issue we are facing on Spark version 1.5.1. 
It is throwing JAVA runtime exception error 
"apache.hadoop.fs.s3a.S3AFileSystem" not found. Although it works perfectly on 
Spark version 1.3.1. Is this known issue on Spark1.5.1? I have opened case with 
Cloudera CDH but they are not fully supporting this yet. We are using 
spark-shell (scala) now a day lot so end user would prefer this environment to 
execute there HQL and most of datasets exist at S3 bucket. Note that there is 
no complain if the dataset call from HDFS (Hadoop FS) so it seems to be related 
to my Spark configuration or something similar. Pls help to identify root cause 
and its solution. Following is the more technical info for review :

scala> val rdf1 = sqlContext.sql("Select * from 
ntcom.nc_currency_dim").collect()
rdf1: Array[org.apache.spark.sql.Row] = 
Array([-1,UNK,UNKNOWN,UNKNOWN,0.74,1.35,1.0,1.0,DBUDAL,11-JUN-2014 
20:36:41,JHOSLE,2008-03-26 00:00:00.0,105.0,6.1,2014-06-11 20:36:41,2015-07-08 
22:10:02,N], [-1,UNK,UNKNOWN,UNKNOWN,1.0,1.0,1.0,1.0,PDHAVA,08-JUL-2015 
22:10:03,JHOSLE,2008-03-26 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01 
00:00:00,Y], [1,DKK,Danish Krone,Danish 
Krone,0.13,7.46,0.180965147453,5.53,DBUDAL,11-JUN-2014 
20:36:41,NCBATCH,2007-01-16 00:00:00.0,19.0,1.1,2014-06-11 20:36:41,2015-07-08 
22:10:02,N], [1,DKK,Danish Krone,Danish 
Krone,0.134048257372654,7.46,0.134048257372654,7.46,PDHAVA,08-JUL-2015 
22:10:03,NCBATCH,2007-01-16 00:00:00.0,null,null,2015-07-08 22:10:03,3000-01-01 
00:00:00,Y], [2,EUR,Euro,EMU currency 
(Euro),1.0,1.0,1.35,0.74,DBUDAL,11-JUN-2014 20:36:41,NCBA...

rdf1 = sqlContext.sql("Select * from dev_ntcom.nc_currency_dim").collect()
11:52 AM
ava.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
        at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:303)
        at scala.Option.map(Option.scala:145)
Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.s3a.S3AFileSystem not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
        ... 120 more
15/11/05 20:31:01 ERROR log: error in initSerDe: 
org.apache.hadoop.hive.serde2.SerDeException Encountered exception determining 
schema. Returning signal schema to indicate problem: 
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem 
not found
org.apache.hadoop.hive.serde2.SerDeException: Encountered exception determining 
schema. Returning signal schema to indicate problem: 
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs
        at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:524)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to