Kim Hammar created LIVY-581:
-------------------------------
Summary: Edge-case where spark properties are overriden by Livy in
YARN environments
Key: LIVY-581
URL: https://issues.apache.org/jira/browse/LIVY-581
Project: Livy
Issue Type: Bug
Reporter: Kim Hammar
Fix For: 0.7.0
We use livy inside our multi-tenant data science platform that is running on
YARN and HDFS. Recently we added support for SparkSQL on Hive by placing the
necessary jar files in spark/jars, adding hive-site-xml in spark/conf and
setting livy.repl.enableHiveContext=trueinlivy.conf.
However, yesterday, I discovered that when livy started the spark session it
overrides our properties in spark.yarn.dist.files and spark.yarn.jars, this was
never an issue before we enabled hive. Looking into the code, I found that what
happens is that if hive is enabled, livy appends (if not already exists) the
hive-site.xml to the list of files specified by the user in the spark.files
property and the necessary hive jars to the list of spark jars specified by the
user-request in the property spark.jars*,* see the related code snippet here:
[https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L285]
Now what seems to happen is that if all of spark.files, spark.jars,
spark.yarn.dist.files, and spark.yarn.jars are non-null when the job is
submitted (spark.files spark.jars filled in by livy and spark.yarn.dist.files
spark.yarn.jars filled in by the user-request from our
platform),*_spark.yarn.dist.files gets set to spark.files and spark.yarn.jars
gets set to spark.jars_*
Since for example spark.files and spark.yarn.dist.files have the same semantics
but are supposed to be used for non-yarn and yarn deployments, respectively,
spark just overwrites spark.yarn.dist.files with the contents of spark.files.
In general, these configuration properties should be mutually exclusive, you
should not mix them as one is designed for YARN mode and the other is for
non-YARN mode.
Our current solution is to deploy a fork of livy on our platform where I check
in the code whether the user-request have populated spark.yarn.X properties and
then I append all livy-generated properties to the yarn-ones. Otherwise I
append the livy-generated properties to the regular spark.X properties, see
code snippet here:
https://github.com/Limmen/incubator-livy/commit/aa06f896753ae9d6ce6aa66a80cca36a82f84202
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)