[
https://issues.apache.org/jira/browse/HIVE-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Deepak Jaiswal updated HIVE-19967:
----------------------------------
Status: Patch Available (was: In Progress)
> SMB Join : ReduceSink should use correct keys in optraits
> ---------------------------------------------------------
>
> Key: HIVE-19967
> URL: https://issues.apache.org/jira/browse/HIVE-19967
> Project: Hive
> Issue Type: Task
> Reporter: Deepak Jaiswal
> Assignee: Deepak Jaiswal
> Priority: Major
>
> The optraits for ReduceSinkOp used to use the key columns as bucket and sort
> columns which worked fine for SMB, however, to enable prefix in Bucket Map
> Join, this logic was updated to use the bucket columns from parent operators.
> However, this may break reduce side SMB in a scenario like this,
>
> Task1 (TS bucketed by col0), passes it down to RS which ignores the key
> columns and uses col0 as bucket key.
> Task2 (Set of ops work such that data is sorted by a set of columns),
> however, with current logic, the bucketing column set in Task1 keeps getting
> pushed in Optraits, thus losing the real flow.
> Task3(Join op) The physical optimizer looks at the parent RS ops which
> incidentally are sorted by same column as the original Task1's bucket column,
> however, in the meantime lost the meaning.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)