[ https://issues.apache.org/jira/browse/HIVE-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jimmy Xiang reassigned HIVE-9339: --------------------------------- Assignee: Jimmy Xiang > Optimize split grouping for CombineHiveInputFormat [Spark Branch] > ----------------------------------------------------------------- > > Key: HIVE-9339 > URL: https://issues.apache.org/jira/browse/HIVE-9339 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Jimmy Xiang > > It seems that split generation, especially in terms of grouping inputs, needs > to be improved. For this, we may need cluster information. Because of this, > we will first try to solve the problem for Spark. > As to cluster information, Spark doesn't provide an API (SPARK-5080). > However, Spark doesn't have a listener API, with which Spark driver can get > notifications about executor going up/down, task starting/finishing, etc. > With this information, Spark client should be able to have a view of the > current cluster image. > Spark developers mentioned that the listener can only be created after > SparkContext is started, at which time, some executions may have already > started and so the listener will miss some information. This can be fixed. > File a JIRA with Spark project if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)