[
https://issues.apache.org/jira/browse/SPARK-35974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved SPARK-35974.
-----------------------------------
Resolution: Cannot Reproduce
Could you try to use Apache Spark 3.1.2, please, [~toopt4], because Apache
Spark 2.4 is EOL. It seems that the log shows `spark-2.3.4-bin-hadoop2.7` and
the affected version is 2.4.6. Both are too old.
> Spark submit REST cluster/standalone mode - launching an s3a jar with STS
> -------------------------------------------------------------------------
>
> Key: SPARK-35974
> URL: https://issues.apache.org/jira/browse/SPARK-35974
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.4.6
> Reporter: t oo
> Priority: Major
>
> {code:java}
> /var/lib/spark-2.3.4-bin-hadoop2.7/bin/spark-submit --master
> spark://myhost:6066 --conf spark.hadoop.fs.s3a.access.key='redact1' --conf
> spark.executorEnv.AWS_ACCESS_KEY_ID='redact1' --conf
> spark.driverEnv.AWS_ACCESS_KEY_ID='redact1' --conf
> spark.hadoop.fs.s3a.secret.key='redact2' --conf
> spark.executorEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf
> spark.driverEnv.AWS_SECRET_ACCESS_KEY='redact2' --conf
> spark.hadoop.fs.s3a.session.token='redact3' --conf
> spark.executorEnv.AWS_SESSION_TOKEN='redact3' --conf
> spark.driverEnv.AWS_SESSION_TOKEN='redact3' --conf
> spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
> --conf spark.driver.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3' --conf
> spark.executor.extraJavaOptions='-DAWS_ACCESS_KEY_ID=redact1
> -DAWS_SECRET_ACCESS_KEY=redact2 -DAWS_SESSION_TOKEN=redact3'
> --total-executor-cores 4 --executor-cores 2 --executor-memory 2g
> --driver-memory 1g --name lin1 --deploy-mode cluster --conf
> spark.eventLog.enabled=false --class com.yotpo.metorikku.Metorikku
> s3a://mybuc/metorikku_2.11.jar -c s3a://mybuc/spark_ingestion_job.yaml
> {code}
> running the above command give below stack trace:
>
> {code:java}
> Exception from the cluster:\njava.nio.file.AccessDeniedException:
> s3a://mybuc/metorikku_2.11.jar: getFileStatus on
> s3a://mybuc/metorikku_2.11.jar:
> com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon
> S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xx; S3 Extended
> Request ID: /1qj/yy=), S3 Extended Request ID: /1qj/yy=\n\
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1542)
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117)
> org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1463)
> org.apache.hadoop.fs.s3a.S3AFileSystem.isFile(S3AFileSystem.java:2030)
> org.apache.spark.util.Utils$.fetchHcfsFile(Utils.scala:747)
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:723)
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
> org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
> org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
> org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92){code}
> all the ec2s in the spark cluster only have access to s3 via STS tokens. The
> jar itself reads csvs from s3 using the tokens, and everything works if
> either 1. i change the commandline to point to local jars on the ec2 OR 2.
> use port 7077/client mode instead of cluster mode. But it seems the jar
> itself can't be launched off s3, as if the tokens are not being picked up
> properly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]