Eugene Koifman created HIVE-20901:

             Summary: running compactor when there is nothing to do produces 
duplicate data
                 Key: HIVE-20901
             Project: Hive
          Issue Type: Bug
          Components: Transactions
    Affects Versions: 4.0.0
            Reporter: Eugene Koifman
            Assignee: Eugene Koifman

suppose we run minor compaction 2 times, via alter table

The 2nd request to compaction should have nothing to do but I don't think there 
is a check for that.  It's visible in the context of HIVE-20823, where each 
compactor run produces a delta with new visibility suffix so we end up with 
something like

├── delete_delta_0000001_0000002_v0000019
│   ├── _orc_acid_version
│   └── bucket_00000
├── delete_delta_0000001_0000002_v0000021
│   ├── _orc_acid_version
│   └── bucket_00000
├── delta_0000001_0000001_0000
│   ├── _orc_acid_version
│   └── bucket_00000
├── delta_0000001_0000002_v0000019
│   ├── _orc_acid_version
│   └── bucket_00000
├── delta_0000001_0000002_v0000021
│   ├── _orc_acid_version
│   └── bucket_00000
└── delta_0000002_0000002_0000
    ├── _orc_acid_version
    └── bucket_00000{noformat}
i.e. 2 deltas with the same write ID range

this is bad.  Probably happens today as well but new run produces a delta with 
the same name and clobbers the previous one, which may interfere with writers


need to investigate

This message was sent by Atlassian JIRA

Reply via email to