Yep it's a Hadoop issue: https://issues.apache.org/jira/browse/HADOOP-11863
http://mail-archives.apache.org/mod_mbox/hadoop-user/201504.mbox/%3CCA+XUwYxPxLkfhOxn1jNkoUKEQQMcPWFzvXJ=u+kp28kdejo...@mail.gmail.com%3E
http://stackoverflow.com/a/28033408/3271168
So for now need to manually add that jar to classpath on hadoop-2.6.
Thanks,
Peter Rudenko
On 2015-05-07 19:41, Nicholas Chammas wrote:
I can try that, but the issue is I understand this is supposed to work
out of the box (like it does with all the other Spark/Hadoop pre-built
packages).
On Thu, May 7, 2015 at 12:35 PM Peter Rudenko <petro.rude...@gmail.com
<mailto:petro.rude...@gmail.com>> wrote:
Try to download this jar:
http://search.maven.org/remotecontent?filepath=org/apache/hadoop/hadoop-aws/2.6.0/hadoop-aws-2.6.0.jar
And add:
export CLASSPATH=$CLASSPATH:hadoop-aws-2.6.0.jar
And try to relaunch.
Thanks,
Peter Rudenko
On 2015-05-07 19:30, Nicholas Chammas wrote:
Hmm, I just tried changing |s3n| to |s3a|:
|py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe. :
java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.fs.s3a.S3AFileSystem not found |
Nick
On Thu, May 7, 2015 at 12:29 PM Peter Rudenko
<petro.rude...@gmail.com <mailto:petro.rude...@gmail.com>> wrote:
Hi Nick, had the same issue.
By default it should work with s3a protocol:
sc.textFile('s3a://bucket/file_*').count()
If you want to use s3n protocol you need to add
hadoop-aws.jar to spark's classpath. Wich hadoop vendor
(Hortonworks, Cloudera, MapR) do you use?
Thanks,
Peter Rudenko
On 2015-05-07 19:25, Nicholas Chammas wrote:
Details are here:https://issues.apache.org/jira/browse/SPARK-7442
It looks like something specific to building against Hadoop 2.6?
Nick