[ 
https://issues.apache.org/jira/browse/LIVY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808671#comment-16808671
 ] 

Kim Hammar commented on LIVY-581:
---------------------------------

PR: https://github.com/apache/incubator-livy/pull/165

> Edge-case where spark properties are overriden by Livy in YARN environments
> ---------------------------------------------------------------------------
>
>                 Key: LIVY-581
>                 URL: https://issues.apache.org/jira/browse/LIVY-581
>             Project: Livy
>          Issue Type: Bug
>            Reporter: Kim Hammar
>            Priority: Minor
>             Fix For: 0.7.0
>
>   Original Estimate: 1h
>          Time Spent: 10m
>  Remaining Estimate: 50m
>
> We use livy inside our multi-tenant data science platform that is running on 
> YARN and HDFS. Recently we added support for SparkSQL on Hive by placing the 
> necessary jar files in spark/jars, adding hive-site-xml in spark/conf and 
> setting livy.repl.enableHiveContext=trueinlivy.conf.
> However, yesterday, I discovered that when livy started the spark session it 
> overrides our properties in spark.yarn.dist.files and spark.yarn.jars, this 
> was never an issue before we enabled hive. Looking into the code, I found 
> that what happens is that if hive is enabled, livy appends (if not already 
> exists) the hive-site.xml to the list of files specified by the user in the 
> spark.files property and the necessary hive jars to the list of spark jars 
> specified by the user-request in the property spark.jars*,* see the related 
> code snippet here:
> [https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L285]
> Now what seems to happen is that if all of spark.files, spark.jars, 
> spark.yarn.dist.files, and spark.yarn.jars are non-null when the job is 
> submitted (spark.files spark.jars filled in by livy and spark.yarn.dist.files 
> spark.yarn.jars filled in by the user-request from our 
> platform),*_spark.yarn.dist.files gets set to spark.files and spark.yarn.jars 
> gets set to spark.jars_*
> Since for example spark.files and spark.yarn.dist.files have the same 
> semantics but are supposed to be used for non-yarn and yarn deployments, 
> respectively, spark just overwrites spark.yarn.dist.files with the contents 
> of spark.files. In general, these configuration properties should be mutually 
> exclusive, you should not mix them as one is designed for YARN mode and the 
> other is for non-YARN mode.
> Our current solution is to deploy a fork of livy on our platform where I 
> check in the code whether the user-request have populated spark.yarn.X 
> properties and then I append all livy-generated properties to the yarn-ones. 
> Otherwise I append the livy-generated properties to the regular spark.X 
> properties, see code snippet here:
> https://github.com/Limmen/incubator-livy/commit/aa06f896753ae9d6ce6aa66a80cca36a82f84202



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to