Peter Varga created HIVE-24481: ---------------------------------- Summary: Skipped compaction can cause data corruption with streaming Key: HIVE-24481 URL: https://issues.apache.org/jira/browse/HIVE-24481 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga
Timeline: 1. create a partitioned table, add one static partition 2. transaction 1 writes delta_1, and aborts 3. create streaming connection, with batch 3, withStaticPartitionValues with the existing partition 4. beginTransaction, write, commitTransaction 5. beginTransaction, write, abortTransaction 6. beingTransaction, write, commitTransaction 7. close connection, count of the table is 2 8. run manual minor compaction on the partition. it will skip compaction, because deltacount =1 but clean, because there is aborted txn1 9. cleaner will remove both aborted record from txn_components 10. wait for acidhousekeeper to remove empty aborted txns 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)