[
https://issues.apache.org/jira/browse/SPARK-12557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073938#comment-15073938
]
Chiragkumar commented on SPARK-12557:
-------------------------------------
Thanks Sean. Should I know the name of libraries so can able to manage in our
existing dependancies? My mean it is good if there is little more info so its
easy to fix.
I made sure following JARs (classpath) exist for CDH with S3 authentication.
A. hadoop-common.jar(optional if already present in class path)
B. hadoop-aws.jar
C. aws-java-sdk.jar
> Spark 1.5.1 is unable to read S3 file system (Java exception -
> s3a.S3AFileSystem not found)
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-12557
> URL: https://issues.apache.org/jira/browse/SPARK-12557
> Project: Spark
> Issue Type: Bug
> Components: EC2, PySpark
> Affects Versions: 1.5.1
> Environment: AWS (EC2) instances + S3 + Hadoop CDH
> Reporter: Chiragkumar
>
> Hello Technical Support team,
> This is one of critical production issue we are facing on Spark version
> 1.5.1. It is throwing JAVA runtime exception error
> "apache.hadoop.fs.s3a.S3AFileSystem" not found. Although it works perfectly
> on Spark version 1.3.1. Is this known issue on Spark1.5.1? I have opened case
> with Cloudera CDH but they are not fully supporting this yet. We are using
> spark-shell (scala) now a day lot so end user would prefer this environment
> to execute there HQL and most of datasets exist at S3 bucket. Note that there
> is no complain if the dataset call from HDFS (Hadoop FS) so it seems to be
> related to my Spark configuration or something similar. Pls help to identify
> root cause and its solution. Following is the more technical info for review :
> scala> val rdf1 = sqlContext.sql("Select * from
> ntcom.nc_currency_dim").collect()
> rdf1: Array[org.apache.spark.sql.Row] =
> Array([-1,UNK,UNKNOWN,UNKNOWN,0.74,1.35,1.0,1.0,DBUDAL,11-JUN-2014
> 20:36:41,JHOSLE,2008-03-26 00:00:00.0,105.0,6.1,2014-06-11
> 20:36:41,2015-07-08 22:10:02,N],
> [-1,UNK,UNKNOWN,UNKNOWN,1.0,1.0,1.0,1.0,PDHAVA,08-JUL-2015
> 22:10:03,JHOSLE,2008-03-26 00:00:00.0,null,null,2015-07-08
> 22:10:03,3000-01-01 00:00:00,Y], [1,DKK,Danish Krone,Danish
> Krone,0.13,7.46,0.180965147453,5.53,DBUDAL,11-JUN-2014
> 20:36:41,NCBATCH,2007-01-16 00:00:00.0,19.0,1.1,2014-06-11
> 20:36:41,2015-07-08 22:10:02,N], [1,DKK,Danish Krone,Danish
> Krone,0.134048257372654,7.46,0.134048257372654,7.46,PDHAVA,08-JUL-2015
> 22:10:03,NCBATCH,2007-01-16 00:00:00.0,null,null,2015-07-08
> 22:10:03,3000-01-01 00:00:00,Y], [2,EUR,Euro,EMU currency
> (Euro),1.0,1.0,1.35,0.74,DBUDAL,11-JUN-2014 20:36:41,NCBA...
> rdf1 = sqlContext.sql("Select * from dev_ntcom.nc_currency_dim").collect()
> 11:52 AM
> ava.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
> at
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
> at
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$2.apply(ClientWrapper.scala:303)
> at scala.Option.map(Option.scala:145)
> Caused by: java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
> at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
> ... 120 more
> 15/11/05 20:31:01 ERROR log: error in initSerDe:
> org.apache.hadoop.hive.serde2.SerDeException Encountered exception
> determining schema. Returning signal schema to indicate problem:
> java.lang.ClassNotFoundException: Class
> org.apache.hadoop.fs.s3a.S3AFileSystem not found
> org.apache.hadoop.hive.serde2.SerDeException: Encountered exception
> determining schema. Returning signal schema to indicate problem:
> java.lang.ClassNotFoundException: Class org.apache.hadoop.fs
> at
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:524)
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]