[jira] [Commented] (SPARK-12557) Spark 1.5.1 is unable to read S3 file system (Java exception - s3a.S3AFileSystem not found)

Chiragkumar (JIRA) Tue, 29 Dec 2015 06:12:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073938#comment-15073938
 ]


Chiragkumar commented on SPARK-12557:
-------------------------------------

Thanks Sean. Should I know the name of libraries so can able to manage in our 
existing dependancies? My mean it is good if there is little more info so its 
easy to fix. 

I made sure following JARs (classpath) exist for CDH with S3 authentication.
A. hadoop-common.jar(optional if already present in class path)
B. hadoop-aws.jar
C. aws-java-sdk.jar

> Spark 1.5.1 is unable to read S3 file system (Java exception - 
> s3a.S3AFileSystem not found) 
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12557
>                 URL: https://issues.apache.org/jira/browse/SPARK-12557
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2, PySpark
>    Affects Versions: 1.5.1
>         Environment: AWS (EC2) instances + S3 + Hadoop CDH 
>            Reporter: Chiragkumar
>
> Hello Technical Support team, 
> This is one of critical production issue we are facing on Spark version 
> 1.5.1. It is throwing JAVA runtime exception error 
> "apache.hadoop.fs.s3a.S3AFileSystem" not found. Although it works perfectly 
> on Spark version 1.3.1. Is this known issue on Spark1.5.1? I have opened case 
> with Cloudera CDH but they are not fully supporting this yet. We are using 
> spark-shell (scala) now a day lot so end user would prefer this environment 
> to execute there HQL and most of datasets exist at S3 bucket. Note that there 
> is no complain if the dataset call from HDFS (Hadoop FS) so it seems to be 
> related to my Spark configuration or something similar. Pls help to identify 
> root cause and its solution. Following is the more technical info for review :
> scala> val rdf1 = sqlContext.sql("Select * from 
> ntcom.nc_currency_dim").collect()
> rdf1: Array[org.apache.spark.sql.Row] = 
> Array([-1,UNK,UNKNOWN,UNKNOWN,0.74,1.35,1.0,1.0,DBUDAL,11-JUN-2014 
> 20:36:41,JHOSLE,2008-03-26 00:00:00.0,105.0,6.1,2014-06-11 
> 20:36:41,2015-07-08 22:10:02,N], 
> [-1,UNK,UNKNOWN,UNKNOWN,1.0,1.0,1.0,1.0,PDHAVA,08-JUL-2015 
> 22:10:03,JHOSLE,2008-03-26 00:00:00.0,null,null,2015-07-08 
> 22:10:03,3000-01-01 00:00:00,Y], [1,DKK,Danish Krone,Danish 
> Krone,0.13,7.46,0.180965147453,5.53,DBUDAL,11-JUN-2014 
> 20:36:41,NCBATCH,2007-01-16 00:00:00.0,19.0,1.1,2014-06-11 
> 20:36:41,2015-07-08 22:10:02,N], [1,DKK,Danish Krone,Danish 
> Krone,0.134048257372654,7.46,0.134048257372654,7.46,PDHAVA,08-JUL-2015 
> 22:10:03,NCBATCH,2007-01-16 00:00:00.0,null,null,2015-07-08 
> 22:10:03,3000-01-01 00:00:00,Y], [2,EUR,Euro,EMU currency 
> (Euro),1.0,1.0,1.35,0.74,DBUDAL,11-JUN-2014 20:36:41,NCBA...
> rdf1 = sqlContext.sql("Select * from dev_ntcom.nc_currency_dim").collect()
> 11:52 AM
> ava.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>       at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
>       at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
> at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:303)
>       at scala.Option.map(Option.scala:145)
> Caused by: java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
>       at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
>       at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
>       ... 120 more
> 15/11/05 20:31:01 ERROR log: error in initSerDe: 
> org.apache.hadoop.hive.serde2.SerDeException Encountered exception 
> determining schema. Returning signal schema to indicate problem: 
> java.lang.ClassNotFoundException: Class 
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> org.apache.hadoop.hive.serde2.SerDeException: Encountered exception 
> determining schema. Returning signal schema to indicate problem: 
> java.lang.ClassNotFoundException: Class org.apache.hadoop.fs
>       at 
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:524)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-12557) Spark 1.5.1 is unable to read S3 file system (Java exception - s3a.S3AFileSystem not found)

Reply via email to