[
https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390198#comment-17390198
]
t oo edited comment on SPARK-35974 at 7/29/21, 11:38 PM:
---------------------------------------------------------
same issue on spark 3.1.2:
{code:java}
{
"action" : "SubmissionStatusResponse",
"driverState" : "ERROR",
"message" : "Exception from the
cluster:\njava.nio.file.AccessDeniedException:
s3a://redact/ingestion-0.5.2-SNAPSHOT.jar: getFileStatus on
s3a://redact/ingestion-0.5.2-SNAPSHOT.jar:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: hidden; S3
Extended Request ID: hideit), S3 Extended Request ID:
hideit\n\torg.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)\n\torg.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)\n\torg.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)\n\torg.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)\n\torg.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)\n\torg.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)\n\torg.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:799)\n\torg.apache.spark.util.Utils$.doFetchFile(Utils.scala:776)\n\torg.apache.spark.util.Utils$.fetchFile(Utils.scala:541)\n\torg.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:162)\n\torg.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:180)\n\torg.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99)",
"serverSparkVersion" : "3.1.2",
"submissionId" : "driver-20210729233253-0001",
"success" : true,
"workerHostPort" : "10.redact:17537",
"workerId" : "worker-20210729232355-10.redact-17537"
}
{code}
was (Author: toopt4):
same issue on spark 3.1.2:
{
"action" : "SubmissionStatusResponse",
"driverState" : "ERROR",
"message" : "Exception from the cluster:\njava.nio.file.AccessDeniedException:
s3a://redact/ingestion-0.5.2-SNAPSHOT.jar: getFileStatus on
s3a://redact/ingestion-0.5.2-SNAPSHOT.jar:
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: hidden; S3
Extended Request ID: hideit), S3 Extended Request ID:
hideit\n\torg.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)\n\torg.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)\n\torg.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)\n\torg.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)\n\torg.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)\n\torg.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)\n\torg.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:799)\n\torg.apache.spark.util.Utils$.doFetchFile(Utils.scala:776)\n\torg.apache.spark.util.Utils$.fetchFile(Utils.scala:541)\n\torg.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:162)\n\torg.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:180)\n\torg.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99)",
"serverSparkVersion" : "3.1.2",
"submissionId" : "driver-20210729233253-0001",
"success" : true,
"workerHostPort" : "10.redact:17537",
"workerId" : "worker-20210729232355-10.redact-17537"
}
> Spark submit REST cluster/standalone mode - launching an s3a jar with STS
> -------------------------------------------------------------------------
>
> Key: SPARK-35974
> URL: https://issues.apache.org/jira/browse/SPARK-35974
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.1.2
> Reporter: t oo
> Priority: Major
>
> {code:java}
> /var/lib/spark-2.4.8-bin-hadoop2.7/bin/spark-submit --master
> spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf
> spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf
> spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf
> spark.hadoop.fs.s3a.secret.key='redact2' --conf
> spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf
> spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf
> spark.hadoop.fs.s3a.session.token='redact3' --conf
> spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf
> spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf
> spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
> --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf
> spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3'
> --total-executor-cores 4 --executor-cores 2 --executor-memory 2g
> --driver-memory 1g --name lin1 --deploy-mode cluster --conf
> spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku
> s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
> {code}
> running the above command give below stack trace:
>
> {code:java}
> Exception from the cluster:\njava.nio.file.AccessDeniedException:
> s3a://mybuc/metorikku_2.11.jar: getFileStatus on
> s3a://mybuc/metorikku_2.11.jar:
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
> S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended
> Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
> org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)
> org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)
> org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
> org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
> org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code}
> all the ec2s in the spark cluster only have access to s3 via STS tokens. The
> jar itself reads csvs from s3 using the tokens, and everything works if
> either 1. i change the commandline to point to local jars on the ec2 OR 2.
> use port 7077/client mode instead of cluster mode. But it seems the jar
> itself can't be launched off s3, as if the tokens are not being picked up
> properly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]