Eugene Koifman created HIVE-20901:
-------------------------------------
Summary: running compactor when there is nothing to do produces
duplicate data
Key: HIVE-20901
URL: https://issues.apache.org/jira/browse/HIVE-20901
Project: Hive
Issue Type: Bug
Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
suppose we run minor compaction 2 times, via alter table
The 2nd request to compaction should have nothing to do but I don't think there
is a check for that. It's visible in the context of HIVE-20823, where each
compactor run produces a delta with new visibility suffix so we end up with
something like
{noformat}
target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/
├── delete_delta_0000001_0000002_v0000019
│ ├── _orc_acid_version
│ └── bucket_00000
├── delete_delta_0000001_0000002_v0000021
│ ├── _orc_acid_version
│ └── bucket_00000
├── delta_0000001_0000001_0000
│ ├── _orc_acid_version
│ └── bucket_00000
├── delta_0000001_0000002_v0000019
│ ├── _orc_acid_version
│ └── bucket_00000
├── delta_0000001_0000002_v0000021
│ ├── _orc_acid_version
│ └── bucket_00000
└── delta_0000002_0000002_0000
├── _orc_acid_version
└── bucket_00000{noformat}
i.e. 2 deltas with the same write ID range
this is bad. Probably happens today as well but new run produces a delta with
the same name and clobbers the previous one, which may interfere with writers
need to investigate
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)