[jira] [Commented] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]

Chinna Rao Lalam (JIRA) Wed, 20 Aug 2014 22:31:00 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105066#comment-14105066
 ]


Chinna Rao Lalam commented on HIVE-7593:
----------------------------------------

Hi [~brocknoland],

Rite if SPARK-2243 is in that our approach will change totally. I think 
SPARK-2243 this will take some time to complete mean while the current patch is 
adding missing functionality like recreating the spark client if configurations 
are updated (through set command). 

Once SPARK-2243 is in we can change our approach completely. For this we can 
add a follow up jira or for the current patch i will make separate jira and we 
can keep this jira as it is.

Please your thoughts on this.



> Instantiate SparkClient per user session [Spark Branch]
> -------------------------------------------------------
>
>                 Key: HIVE-7593
>                 URL: https://issues.apache.org/jira/browse/HIVE-7593
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch
>
>
> SparkContext is the main class via which Hive talk to Spark cluster. 
> SparkClient encapsulates a SparkContext instance. Currently all user sessions 
> share a single SparkClient instance in HiveServer2. While this is good enough 
> for a POC, even for our first two milestones, this is not desirable for a 
> multi-tenancy environment and gives least flexibility to Hive users. Here is 
> what we propose:
> 1. Have a SparkClient instance per user session. The SparkClient instance is 
> created when user executes its first query in the session. It will get 
> destroyed when user session ends.
> 2. The SparkClient is instantiated based on the spark configurations that are 
> available to the user, including those defined at the global level and those 
> overwritten by the user (thru set command, for instance).
> 3. Ideally, when user changes any spark configuration during the session, the 
> old SparkClient instance should be destroyed and a new one based on the new 
> configurations is created. This may turn out to be a little hard, and thus 
> it's a "nice-to-have". If not implemented, we need to document that 
> subsequent configuration changes will not take effect in the current session.
> Please note that there is a thread-safety issue on Spark side where multiple 
> SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need 
> to work with Spark community to get this addressed.
> Besides above functional requirements, avoid potential issues is also a 
> consideration. For instance, sharing SC among users is bad, as resources 
> (such as jar for UDF) will be also shared, which is problematic. On the other 
> hand, one SC per job seems too expensive, as the resource needs to be 
> re-rendered even there isn't any change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]

Reply via email to