[ https://issues.apache.org/jira/browse/HIVE-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang reassigned HIVE-7768: --------------------------------- Assignee: Chengxiang Li (was: Szehon Ho) Since Szehon is busy with SMB->MJ conversion and Chengxiang is helping with Spark integration, I move this JIRA to Chengxiang. [~chengxiang li], could you please start looking at this issue when you have time. > Integrate with Spark executor scaling [Spark Branch] > ---------------------------------------------------- > > Key: HIVE-7768 > URL: https://issues.apache.org/jira/browse/HIVE-7768 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Brock Noland > Assignee: Chengxiang Li > Priority: Critical > > Scenario: > A user connects to Hive and runs a query on a small time. Our SC is sized for > that small table. They then run a query on a much larger table. We'll need to > "re-size" the SC which I don't think Spark supports today, so we need to > research what is available today in Spark and how Tez works. > More details: > Similar to Tez, it's likely our "SparkContext" is going to be long lived and > process many queries. Some queries will be large and some small. Additionally > the SC might be idle for long periods of time. > In this JIRA we will research the following: > * How Spark decides the number of slaves for a given RDD today > * Given a SC when you create a new RDD based on a much larger input dataset, > does the SC adjust? > * How Tez increases/decreases the size of the running YARN application (set > of slaves) > * How Tez handles scenarios when it has a running set of slaves in YARN and > requests more resources for a query and fails to get additional resources > * How Tez decides to timeout idle slaves > This will guide requirements we'll need from Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)