[ 
https://issues.apache.org/jira/browse/SPARK-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169013#comment-15169013
 ] 

Michel Lemay commented on SPARK-13044:
--------------------------------------

I get something similar.  When trying to read KMS encrypted file from S3, I get 
an error about AWS Signature version 4.

When running locally compiled tip of the master branch (2.0.0-SNAPSHOT) , I get 
a verbore error message about signature version:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: 
Service Error Message. -- ResponseCode: 400, ResponseStatus: Bad Request, XML 
Error Message: <?xml version="1.0" 
encoding="UTF-8"?><Error><Code>InvalidArgument</Code><Message>{color:red}Requests
 specifying Server Side Encryption with AWS KMS managed keys require AWS 
Signature Version 
4.{color}</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>null</ArgumentValue><RequestId>...</RequestId><HostId>...</HostId></Error>
        at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:464)
        at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
        at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:210)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
        at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.open(NativeS3FileSystem.java:627)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
        at 
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
        at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:248)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69)
        at org.apache.spark.scheduler.Task.run(Task.scala:81)


Under Spark 1.6.0, it translates to a NPE:

java.lang.NullPointerException
        at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:152)
        at 
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:89)
        at 
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
        at 
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:126)
        at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)







> saveAsTextFile() doesn't support s3 Signature Version 4
> -------------------------------------------------------
>
>                 Key: SPARK-13044
>                 URL: https://issues.apache.org/jira/browse/SPARK-13044
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.4.0
>         Environment: CentOS
>            Reporter: Xin Ren
>              Labels: aws-s3
>
> I have two clusters deployed: US and EU-Frankfort with the same configs on 
> AWS. 
> And the application in EU-Frankfort cannot save data to EU-Frankfort-s3, but 
> US one can save to US-s3.
> And I checked and found that EU-Frankfort supports Signature Version 4 only: 
> http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
> Code I'm using:
> {code:java}
> val s3WriteEndpoint = "s3n://access_key:secret_key@bucket_name/data/12345"
> rdd.saveAsTextFile(s3WriteEndpoint)
> {code}
> So from my issue I guess saveAsTextFile() is using Signature Version 2? How 
> to support Version 4?
> I tried to dig into code
> https://github.com/apache/spark/blob/f14922cff84b1e0984ba4597d764615184126bdc/core/src/main/scala/org/apache/spark/rdd/RDD.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to