[
https://issues.apache.org/jira/browse/HIVE-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-15844:
----------------------------------
Description:
# both FileSinkDesk and ReduceSinkDesk have special code path for Update/Delete
operations. It is not always set correctly for ReduceSink.
ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't set
correctly, elsewhere
(_SemanticAnalyzer.getPartitionColsFromBucketColsForUpdateDelete()_) we set
ROW_ID to be the partition column of the ReduceSinkOperator and UDFToInteger
special cases it to extract bucketId from ROW_ID. We need to modify Explain
Plan to record Write Type (i.e. insert/update/delete) to make sure we have
tests that can catch errors here.
# Add some validation at the end of the plan to make sure that RSO/FSO which
represent the end of the pipeline and write to acid table have WriteType set
(to something other than default).
# We don't seem to have any tests where number of buckets is > number of
reducers. Add those.
was:
# both FileSinkDesk and ReduceSinkDesk have special code path for Update/Delete
operations. It is not always set correctly for ReduceSink.
ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't set
correctly, elsewhere
(SemanticAnalyzer.getPartitionColsFromBucketColsForUpdateDelete()) we set
ROW_ID to be the partition column of the ReduceSinkOperator and UDFToInteger
special cases it to extract bucketId from ROW_ID. We need to modify Explain
Plan to record Write Type (i.e. insert/update/delete) to make sure we have
tests that can catch errors here.
# Add some validation at the end of the plan to make sure that RSO/FSO which
represent the end of the pipeline and write to acid table have WriteType set
(to something other than default).
# We don't seem to have any tests where number of buckets is > number of
reducers. Add those.
> Make ReduceSinkOperator independent of Acid
> -------------------------------------------
>
> Key: HIVE-15844
> URL: https://issues.apache.org/jira/browse/HIVE-15844
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
> Fix For: 2.2.0
>
> Attachments: HIVE-15844.01.patch, HIVE-15844.02.patch,
> HIVE-15844.03.patch, HIVE-15844.04.patch, HIVE-15844.05.patch,
> HIVE-15844.06.patch, HIVE-15844.07.patch, HIVE-15844.08.patch
>
>
> # both FileSinkDesk and ReduceSinkDesk have special code path for
> Update/Delete operations. It is not always set correctly for ReduceSink.
> ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't
> set correctly, elsewhere
> (_SemanticAnalyzer.getPartitionColsFromBucketColsForUpdateDelete()_) we set
> ROW_ID to be the partition column of the ReduceSinkOperator and UDFToInteger
> special cases it to extract bucketId from ROW_ID. We need to modify Explain
> Plan to record Write Type (i.e. insert/update/delete) to make sure we have
> tests that can catch errors here.
> # Add some validation at the end of the plan to make sure that RSO/FSO which
> represent the end of the pipeline and write to acid table have WriteType set
> (to something other than default).
> # We don't seem to have any tests where number of buckets is > number of
> reducers. Add those.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)