tanishqchugh created HIVE-29210: ----------------------------------- Summary: Minor compaction produces duplicates conditionally in case of HMS instance running initiator crash Key: HIVE-29210 URL: https://issues.apache.org/jira/browse/HIVE-29210 Project: Hive Issue Type: Bug Reporter: tanishqchugh Assignee: tanishqchugh
In a case, with multiple HiveServer2 (HS2) instances, one of the HS2 instances may run on the same host as the Hive Metastore (HMS). In this setup, the initiator runs within HMS, while the compaction worker threads run within HS2. If the HMS instance unexpectedly crashes, the method revokeFromLocalWorkers() is invoked. This method resets all compaction jobs back to the initiated state, provided they were running on the same host. We believe this behavior is by design: if both HMS and HS2(running workers) were to crash simultaneously, and jobs were not reset, those compactions could remain stalled until revokeTimedoutWorkers() eventually reclaims them. However, in the case where HMS crashes but the HS2 instance survives, the reset still occurs. As a result, the job is made available for reassignment even though the original HS2 worker is still actively processing it. This can lead to a scenario where another HS2 worker picks up the same compaction task, causing two workers to run the same minor compaction job concurrently. This race condition can intermittently result in duplicate records being written to the table. -- This message was sent by Atlassian Jira (v8.20.10#820010)