Aseem Bansal created SPARK-17307:
------------------------------------
Summary: Document what all access is needed on S3 bucket when
trying to save a model
Key: SPARK-17307
URL: https://issues.apache.org/jira/browse/SPARK-17307
Project: Spark
Issue Type: Documentation
Reporter: Aseem Bansal
I faced this lack of documentation when I was trying to save a model to S3.
Initially I thought it should be only write. Then I found it also needs delete
to delete temporary files. Now I requested access for delete and tried again
and I am get the error
Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
org.jets3t.service.S3ServiceException: S3 PUT failed for '/dev-qa_%24folder%24'
XML Error Message
To reproduce this error the below can be used
{code}
SparkSession sparkSession = SparkSession
.builder()
.appName("my app")
.master("local")
.getOrCreate();
JavaSparkContext jsc = new
JavaSparkContext(sparkSession.sparkContext());
jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", <ACCESS_KEY>);
jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", <SECRET
ACCESS KEY>);
//Create a Pipelinemode
pipelineModel.write().overwrite().save("s3n://<BUCKET>/dev-qa/modelTest");
{code}
This back and forth could be avoided if it was clearly mentioned what all
access spark needs to write to S3. Also would be great if why all of the access
is needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]