[ 
https://issues.apache.org/jira/browse/HIVE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216249#comment-16216249
 ] 

Rui Li commented on HIVE-17877:
-------------------------------

Upload a PoC patch. Here're the main changes:
# Before combining, each {{SparkPartitionPruningSinkDesc}} can target only one 
column in one map work. After combing, the remaining 
{{SparkPartitionPruningSinkDesc}} will hold the columns and map works from 
other equivalent {{SparkPartitionPruningSinkDesc}}.
# Two {{SparkPartitionPruningSinkDesc}} are equivalent if they have the same 
TableDesc.
# When we combine two equivalent works, if they contain DPP sinks, we'll merge 
the DPP sinks. Let's suppose we'll merge DPP1 and DPP2, which have target map 
works Map1 and Map2 respectively. First we add the target column/work of DPP2 
to DPP1. Then we update Map2 so that it knows it'll be pruned by DPP1 instead 
of DPP2, i.e. updating the {{eventSource}} maps and tmp path.
# Currently {{CombineEquivalentWorkResolver}} doesn't handle leaf works. With 
the patch, it'll handle leaf works if all leaf operators in the leaf works are 
DPP sinks.
# Currently {{SparkPartitionPruningSinkOperator}} writes the target column name 
into the output file. Since now it can have multiple target columns, it first 
writes the number of columns and then writes all the target column names. In 
order to make column names unique, the target map work ID will be prepended to 
the column name.
# When {{SparkDynamicPartitionPruner}} reads the file, it reads in all the 
column names and finds the {{SourceInfo}} whose name is in the column names.

> HoS: combine equivalent DPP sink works
> --------------------------------------
>
>                 Key: HIVE-17877
>                 URL: https://issues.apache.org/jira/browse/HIVE-17877
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-17877.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to