Kim Hammar created LIVY-581:
-------------------------------

             Summary: Edge-case where spark properties are overriden by Livy in 
YARN environments
                 Key: LIVY-581
                 URL: https://issues.apache.org/jira/browse/LIVY-581
             Project: Livy
          Issue Type: Bug
            Reporter: Kim Hammar
             Fix For: 0.7.0


We use livy inside our multi-tenant data science platform that is running on 
YARN and HDFS. Recently we added support for SparkSQL on Hive by placing the 
necessary jar files in spark/jars, adding hive-site-xml in spark/conf and 
setting livy.repl.enableHiveContext=trueinlivy.conf.

However, yesterday, I discovered that when livy started the spark session it 
overrides our properties in spark.yarn.dist.files and spark.yarn.jars, this was 
never an issue before we enabled hive. Looking into the code, I found that what 
happens is that if hive is enabled, livy appends (if not already exists) the 
hive-site.xml to the list of files specified by the user in the spark.files 
property and the necessary hive jars to the list of spark jars specified by the 
user-request in the property spark.jars*,* see the related code snippet here:

[https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L285]

Now what seems to happen is that if all of spark.files, spark.jars, 
spark.yarn.dist.files, and spark.yarn.jars are non-null when the job is 
submitted (spark.files spark.jars filled in by livy and spark.yarn.dist.files 
spark.yarn.jars filled in by the user-request from our 
platform),*_spark.yarn.dist.files gets set to spark.files and spark.yarn.jars 
gets set to spark.jars_*

Since for example spark.files and spark.yarn.dist.files have the same semantics 
but are supposed to be used for non-yarn and yarn deployments, respectively, 
spark just overwrites spark.yarn.dist.files with the contents of spark.files. 
In general, these configuration properties should be mutually exclusive, you 
should not mix them as one is designed for YARN mode and the other is for 
non-YARN mode.

Our current solution is to deploy a fork of livy on our platform where I check 
in the code whether the user-request have populated spark.yarn.X properties and 
then I append all livy-generated properties to the yarn-ones. Otherwise I 
append the livy-generated properties to the regular spark.X properties, see 
code snippet here:

https://github.com/Limmen/incubator-livy/commit/aa06f896753ae9d6ce6aa66a80cca36a82f84202



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to