[jira] [Updated] (HIVE-15844) Make ReduceSinkOperator independent of Acid

Eugene Koifman (JIRA) Wed, 19 Jul 2017 15:21:20 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eugene Koifman updated HIVE-15844:
----------------------------------
    Description: 
# both FileSinkDesk and ReduceSinkDesk have special code path for Update/Delete 
operations. It is not always set correctly for ReduceSink. 
ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't set 
correctly, elsewhere 
(_SemanticAnalyzer.getPartitionColsFromBucketColsForUpdateDelete()_) we set 
ROW_ID to be the partition column of the ReduceSinkOperator and UDFToInteger 
special cases it to extract bucketId from ROW_ID. We need to modify Explain 
Plan to record Write Type (i.e. insert/update/delete) to make sure we have 
tests that can catch errors here.
# Add some validation at the end of the plan to make sure that RSO/FSO which 
represent the end of the pipeline and write to acid table have WriteType set 
(to something other than default).
#  We don't seem to have any tests where number of buckets is > number of 
reducers. Add those.

  was:
# both FileSinkDesk and ReduceSinkDesk have special code path for Update/Delete 
operations. It is not always set correctly for ReduceSink. 
ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't set 
correctly, elsewhere 
(SemanticAnalyzer.getPartitionColsFromBucketColsForUpdateDelete()) we set 
ROW_ID to be the partition column of the ReduceSinkOperator and UDFToInteger 
special cases it to extract bucketId from ROW_ID. We need to modify Explain 
Plan to record Write Type (i.e. insert/update/delete) to make sure we have 
tests that can catch errors here.
# Add some validation at the end of the plan to make sure that RSO/FSO which 
represent the end of the pipeline and write to acid table have WriteType set 
(to something other than default).
#  We don't seem to have any tests where number of buckets is > number of 
reducers. Add those.


> Make ReduceSinkOperator independent of Acid
> -------------------------------------------
>
>                 Key: HIVE-15844
>                 URL: https://issues.apache.org/jira/browse/HIVE-15844
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15844.01.patch, HIVE-15844.02.patch, 
> HIVE-15844.03.patch, HIVE-15844.04.patch, HIVE-15844.05.patch, 
> HIVE-15844.06.patch, HIVE-15844.07.patch, HIVE-15844.08.patch
>
>
> # both FileSinkDesk and ReduceSinkDesk have special code path for 
> Update/Delete operations. It is not always set correctly for ReduceSink. 
> ReduceSinkDeDuplication is one place where it gets lost. Even when it isn't 
> set correctly, elsewhere 
> (_SemanticAnalyzer.getPartitionColsFromBucketColsForUpdateDelete()_) we set 
> ROW_ID to be the partition column of the ReduceSinkOperator and UDFToInteger 
> special cases it to extract bucketId from ROW_ID. We need to modify Explain 
> Plan to record Write Type (i.e. insert/update/delete) to make sure we have 
> tests that can catch errors here.
> # Add some validation at the end of the plan to make sure that RSO/FSO which 
> represent the end of the pipeline and write to acid table have WriteType set 
> (to something other than default).
> #  We don't seem to have any tests where number of buckets is > number of 
> reducers. Add those.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-15844) Make ReduceSinkOperator independent of Acid

Reply via email to