[
https://issues.apache.org/jira/browse/SPARK-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169013#comment-15169013
]
Michel Lemay commented on SPARK-13044:
--------------------------------------
I get something similar. When trying to read KMS encrypted file from S3, I get
an error about AWS Signature version 4.
When running locally compiled tip of the master branch (2.0.0-SNAPSHOT) , I get
a verbore error message about signature version:
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:
Service Error Message. -- ResponseCode: 400, ResponseStatus: Bad Request, XML
Error Message: <?xml version="1.0"
encoding="UTF-8"?><Error><Code>InvalidArgument</Code><Message>{color:red}Requests
specifying Server Side Encryption with AWS KMS managed keys require AWS
Signature Version
4.{color}</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>null</ArgumentValue><RequestId>...</RequestId><HostId>...</HostId></Error>
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:464)
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieve(Jets3tNativeFileSystemStore.java:210)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy17.retrieve(Unknown Source)
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem.open(NativeS3FileSystem.java:627)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:248)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69)
at org.apache.spark.scheduler.Task.run(Task.scala:81)
Under Spark 1.6.0, it translates to a NPE:
java.lang.NullPointerException
at
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.seek(NativeS3FileSystem.java:152)
at
org.apache.hadoop.fs.BufferedFSInputStream.seek(BufferedFSInputStream.java:89)
at
org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
at
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:126)
at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> saveAsTextFile() doesn't support s3 Signature Version 4
> -------------------------------------------------------
>
> Key: SPARK-13044
> URL: https://issues.apache.org/jira/browse/SPARK-13044
> Project: Spark
> Issue Type: Bug
> Components: Input/Output
> Affects Versions: 1.4.0
> Environment: CentOS
> Reporter: Xin Ren
> Labels: aws-s3
>
> I have two clusters deployed: US and EU-Frankfort with the same configs on
> AWS.
> And the application in EU-Frankfort cannot save data to EU-Frankfort-s3, but
> US one can save to US-s3.
> And I checked and found that EU-Frankfort supports Signature Version 4 only:
> http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
> Code I'm using:
> {code:java}
> val s3WriteEndpoint = "s3n://access_key:secret_key@bucket_name/data/12345"
> rdd.saveAsTextFile(s3WriteEndpoint)
> {code}
> So from my issue I guess saveAsTextFile() is using Signature Version 2? How
> to support Version 4?
> I tried to dig into code
> https://github.com/apache/spark/blob/f14922cff84b1e0984ba4597d764615184126bdc/core/src/main/scala/org/apache/spark/rdd/RDD.scala
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]