[ 
https://issues.apache.org/jira/browse/SPARK-13979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-13979:
---------------------------------
    Labels: bulk-closed  (was: )

> Killed executor is respawned without AWS keys in standalone spark cluster
> -------------------------------------------------------------------------
>
>                 Key: SPARK-13979
>                 URL: https://issues.apache.org/jira/browse/SPARK-13979
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.2
>         Environment: I'm using Spark 1.5.2 with Hadoop 2.7 and running 
> experiments on a simple standalone cluster:
> 1 master
> 2 workers
> All ubuntu 14.04 with Java 8/Scala 2.10
>            Reporter: Allen George
>            Priority: Major
>              Labels: bulk-closed
>
> I'm having a problem where respawning a failed executor during a job that 
> reads/writes parquet on S3 causes subsequent tasks to fail because of missing 
> AWS keys.
> h4. Setup:
> I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple 
> standalone cluster:
> 1 master
> 2 workers
> My application is co-located on the master machine, while the two workers are 
> on two other machines (one worker per machine). All machines are running in 
> EC2. I've configured my setup so that my application executes its task on two 
> executors (one executor per worker).
> h4. Application:
> My application reads and writes parquet files on S3. I set the AWS keys on 
> the SparkContext by doing:
> val sc = new SparkContext()
> val hadoopConf = sc.hadoopConfiguration
> hadoopConf.set("fs.s3n.awsAccessKeyId", "SOME_KEY")
> hadoopConf.set("fs.s3n.awsSecretAccessKey", "SOME_SECRET")
> At this point I'm done, and I go ahead and use "sc".
> h4. Issue:
> I can read and write parquet files without a problem with this setup. *BUT* 
> if an executor dies during a job and is respawned by a worker, tasks fail 
> with the following error:
> "Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret 
> Access Key must be specified as the username or password (respectively) of a 
> s3n URL, or by setting the {{fs.s3n.awsAccessKeyId}} or 
> {{fs.s3n.awsSecretAccessKey}} properties (respectively)."
> h4. Basic analysis
> I think I've traced this down to the following:
> SparkHadoopUtil is initialized with an empty {{SparkConf}}. Later, classes 
> like {{DataSourceStrategy}} simply call {{SparkHadoopUtil.get.conf}} and 
> access the (now invalid; missing various properties) {{HadoopConfiguration}} 
> that's built from this empty {{SparkConf}} object. It's unclear to me why 
> this is done, and it seems that the code as written would cause broken 
> results anytime callers use {{SparkHadoopUtil.get.conf}} directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to