[
https://issues.apache.org/jira/browse/LIVY-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808671#comment-16808671
]
Kim Hammar commented on LIVY-581:
---------------------------------
PR: https://github.com/apache/incubator-livy/pull/165
> Edge-case where spark properties are overriden by Livy in YARN environments
> ---------------------------------------------------------------------------
>
> Key: LIVY-581
> URL: https://issues.apache.org/jira/browse/LIVY-581
> Project: Livy
> Issue Type: Bug
> Reporter: Kim Hammar
> Priority: Minor
> Fix For: 0.7.0
>
> Original Estimate: 1h
> Time Spent: 10m
> Remaining Estimate: 50m
>
> We use livy inside our multi-tenant data science platform that is running on
> YARN and HDFS. Recently we added support for SparkSQL on Hive by placing the
> necessary jar files in spark/jars, adding hive-site-xml in spark/conf and
> setting livy.repl.enableHiveContext=trueinlivy.conf.
> However, yesterday, I discovered that when livy started the spark session it
> overrides our properties in spark.yarn.dist.files and spark.yarn.jars, this
> was never an issue before we enabled hive. Looking into the code, I found
> that what happens is that if hive is enabled, livy appends (if not already
> exists) the hive-site.xml to the list of files specified by the user in the
> spark.files property and the necessary hive jars to the list of spark jars
> specified by the user-request in the property spark.jars*,* see the related
> code snippet here:
> [https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L285]
> Now what seems to happen is that if all of spark.files, spark.jars,
> spark.yarn.dist.files, and spark.yarn.jars are non-null when the job is
> submitted (spark.files spark.jars filled in by livy and spark.yarn.dist.files
> spark.yarn.jars filled in by the user-request from our
> platform),*_spark.yarn.dist.files gets set to spark.files and spark.yarn.jars
> gets set to spark.jars_*
> Since for example spark.files and spark.yarn.dist.files have the same
> semantics but are supposed to be used for non-yarn and yarn deployments,
> respectively, spark just overwrites spark.yarn.dist.files with the contents
> of spark.files. In general, these configuration properties should be mutually
> exclusive, you should not mix them as one is designed for YARN mode and the
> other is for non-YARN mode.
> Our current solution is to deploy a fork of livy on our platform where I
> check in the code whether the user-request have populated spark.yarn.X
> properties and then I append all livy-generated properties to the yarn-ones.
> Otherwise I append the livy-generated properties to the regular spark.X
> properties, see code snippet here:
> https://github.com/Limmen/incubator-livy/commit/aa06f896753ae9d6ce6aa66a80cca36a82f84202
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)