[
https://issues.apache.org/jira/browse/HIVE-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165443#comment-14165443
]
Eugene Koifman commented on HIVE-8368:
--------------------------------------
Adding for completeness
Before the patch:
{noformat}
hive> explain delete from concur_orc_tab where age >= 20 and age < 30;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: concur_orc_tab
Filter Operator
predicate: ((age >= 20) and (age < 30)) (type: boolean)
Select Operator
expressions: ROW__ID (type:
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
outputColumnNames: _col0
Reduce Output Operator
key expressions: _col0 (type:
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
sort order: -
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type:
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
outputColumnNames: _col0
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
sort order:
Map-reduce partition columns: UDFToInteger(_col0) (type: int)
value expressions: _col0 (type:
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
Reduce Operator Tree:
Extract
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.concur_orc_tab
Stage: Stage-0
Move Operator
tables:
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.concur_orc_tab
Stage: Stage-3
Stats-Aggr Operator
Time taken: 0.697 seconds, Fetched: 62 row(s)
{noformat}
After the patch:
{noformat}
hive> explain delete from concur_orc_tab where age >= 20 and age < 30;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
Stage-2 depends on stages: Stage-0
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: concur_orc_tab
Filter Operator
predicate: ((age >= 20) and (age < 30)) (type: boolean)
Select Operator
expressions: ROW__ID (type:
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
outputColumnNames: _col0
Reduce Output Operator
key expressions: _col0 (type:
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
sort order: +
Map-reduce partition columns: UDFToInteger(_col0) (type: int)
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type:
struct<transactionid:bigint,bucketid:int,rowid:bigint>)
outputColumnNames: _col0
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.concur_orc_tab
Stage: Stage-0
Move Operator
tables:
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.concur_orc_tab
Stage: Stage-2
Stats-Aggr Operator
Time taken: 0.538 seconds, Fetched: 45 row(s)
{noformat}
> compactor is improperly writing delete records in base file
> -----------------------------------------------------------
>
> Key: HIVE-8368
> URL: https://issues.apache.org/jira/browse/HIVE-8368
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 0.14.0
> Reporter: Alan Gates
> Assignee: Alan Gates
> Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8368.2.patch, HIVE-8368.patch
>
>
> When the compactor reads records from the base and deltas, it is not properly
> dropping delete records. This leads to oversized base files, and possibly to
> wrong query results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)