[ https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated HIVE-8118: ------------------------------ Summary: Support work that have multiple child works to work around SPARK [Spark Branch] (was: SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors [Spark Branch]) > Support work that have multiple child works to work around SPARK [Spark > Branch] > -------------------------------------------------------------------------------- > > Key: HIVE-8118 > URL: https://issues.apache.org/jira/browse/HIVE-8118 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Xuefu Zhang > Assignee: Chao > Labels: Spark-M1 > Attachments: HIVE-8118.pdf > > > In the current implementation, both SparkMapRecordHandler and > SparkReduceRecorderHandler takes only one result collector, which limits that > the corresponding map or reduce task can have only one child. It's very > comment in multi-insert queries where a map/reduce task has more than one > children. A query like the following has two map tasks as parents: > {code} > select name, sum(value) from dec group by name union all select name, value > from dec order by name > {code} > It's possible in the future an optimation may be implemented so that a map > work is followed by two reduce works and then connected to a union work. > Thus, we should take this as a general case. Tez is currently providing a > collector for each child operator in the map-side or reduce side operator > tree. We can take Tez as a reference. > Likely this is a big change and subtasks are possible. > With this, we can have a simpler and clean multi-insert implementation. This > is also the problem observed in HIVE-7731. -- This message was sent by Atlassian JIRA (v6.3.4#6332)