[ 
https://issues.apache.org/jira/browse/SPARK-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392963#comment-14392963
 ] 

Cheolsoo Park commented on SPARK-6662:
--------------------------------------

[~srowen], thank you for your comment.
{quote}
Wouldn't you be able to query for the YARN RM address somewhere and include it 
in the config?
{quote}
In typical cloud deployment, there is usually shared gateway from where users 
can connect to various clusters, and there is few Spark configs shared by all 
the clusters. Furthermore, clusters are usually transient in cloud, so I'd like 
to avoid adding any cluster-specific information to Spark configs.

My current workaround is grep'ing {{yarn.resourcemanager.hostname}} from 
yarn-site.xml in my custom job launch script on the gateway and passing it via 
{{--conf}} option in every job launch. The intention was to get rid of this 
hacky bit in my launch script.
{quote}
I am somewhat concerned about adding a narrow bit of support for one particular 
substitution, which in turn is to support a specific assumption in one type of 
deployment.
{quote}
Yes, I understand your concern. Even though I have a specific problem to solve 
at hand, I filed this jira hoping that general variable substitution will be 
added to Spark config. In fact, I made an attempt in that direction but quickly 
ran into the following problems:
# Adding general vars sub to Spark conf doesn't solve my problem. Since Spark 
config and Yarn config are separate entities in Spark, I cannot cross-refer to 
properties from one to the other.
# Alternatively, I could introduce a special logic for 
{{spark.yarn.historyServer.address}} assuming the RM and HS are on the same 
node. Since Spark AM already knows the RM address, it is trivial to implement. 
But this makes a even more specific assumption about the deployment.

Looks to me that it involves quite a bit of refactoring to implement general 
vars sub that allows cross-referring.

So I compromised. That is, I introduced vars sub only to the {{spark.yarn.}} 
properties. In fact, vars sub already work for {{spark.hadoop.}} properties. If 
you look at the code, all the {{spark.hadoop.}} properties are already copied 
over to Yarn config and read via Yarn config. As a side effect, they support 
vars sub. I am just expanding the scope of this *secret* feature to 
{{spark.yarn.}} properties.

For now, I can live with my current workaround. But I wanted to point out that 
it is not user-friendly to ask users to pass explicit hostname and port number 
to make use of HS. In fact, I'm not aware of any other property that causes 
same pain in YARN mode. For eg, the RM address for {{spark.master}} is 
dynamically picked up from yarn-site.xml. The HS address should be handled in a 
similar manner IMO.

Hope this explains my thought process well enough.

> Allow variable substitution in spark.yarn.historyServer.address
> ---------------------------------------------------------------
>
>                 Key: SPARK-6662
>                 URL: https://issues.apache.org/jira/browse/SPARK-6662
>             Project: Spark
>          Issue Type: Wish
>          Components: YARN
>    Affects Versions: 1.3.0
>            Reporter: Cheolsoo Park
>            Priority: Minor
>              Labels: yarn
>
> In Spark on YARN, explicit hostname and port number need to be set for 
> "spark.yarn.historyServer.address" in SparkConf to make the HISTORY link. If 
> the history server address is known and static, this is usually not a problem.
> But in cloud, that is usually not true. Particularly in EMR, the history 
> server always runs on the same node as with RM. So I could simply set it to 
> {{$\{yarn.resourcemanager.hostname\}:18080}} if variable substitution is 
> allowed.
> In fact, Hadoop configuration already implements variable substitution, so if 
> this property is read via YarnConf, this can be easily achievable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to