[ 
https://issues.apache.org/jira/browse/HIVE-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630682#comment-14630682
 ] 

Chengxiang Li commented on HIVE-11276:
--------------------------------------

[~xuefuz], I review the the code in RemoteHiveSparkClient, the reason why it 
need to invoke refreshLocalResources() for every job submission is that Hive 
user may use "ADD \[FILE|JAR|ARCHIVE\] <value>" command to add resources on 
runtime, so spark client need to upload these resources to spark cluster before 
job execution. RemoteHiveSparkClient have a list which records all the 
resources it has uploaded to spark cluster, and use it to filter out already 
uploaded jars during refreshLocalResources(), only new added jars would be 
uploaded to spark cluster, and the list should have a quite small size at most 
time, so i think it should not has performance issue here.

> Optimization around job submission and adding jars [Spark Branch]
> -----------------------------------------------------------------
>
>                 Key: HIVE-11276
>                 URL: https://issues.apache.org/jira/browse/HIVE-11276
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Chengxiang Li
>
> It seems that Hive on Spark has some room for performance improvement on job 
> submission. Specifically, we are calling refreshLocalResources() for every 
> job submission despite there is are no changes in the jar list. Since Hive on 
> Spark is reusing the containers in the whole user session, we might be able 
> to optimize that.
> We do need to take into consideration the case of dynamic allocation, in 
> which new executors might be added.
> This task is some R&D in this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to