[jira] [Created] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]

Xuefu Zhang (JIRA) Mon, 15 Sep 2014 21:13:28 -0700

Xuefu Zhang created HIVE-8118:
---------------------------------

             Summary: SparkMapRecorderHandler and SparkReduceRecordHandler 
should be initialized with multiple result collectors[Spark Branch]
                 Key: HIVE-8118
                 URL: https://issues.apache.org/jira/browse/HIVE-8118
             Project: Hive
          Issue Type: Bug
          Components: Spark
            Reporter: Xuefu Zhang



In the current implementation, both SparkMapRecordHandler and 
SparkReduceRecorderHandler takes only one result collector, which limits that 
the corresponding map or reduce task can have only one child. It's very comment 
in multi-insert queries where a map/reduce task has more than one children. A 
query like the following has two map tasks as parents:

{code}
select name, sum(value) from dec group by name union all select name, value 
from dec order by name
{code}

It's possible in the future an optimation may be implemented so that a map work 
is followed by two reduce works and then connected to a union work.

Thus, we should accommodate this. Tez is currently providing a collector for 
each child operator in the map-side or reduce side operator tree.

Likely this is a big change. With this, we can have a simpler and clean 
multi-insert implementation.

This is also the problem observed in HIVE-7731.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]

Reply via email to