tanishqchugh created HIVE-29210:
-----------------------------------

             Summary: Minor compaction produces duplicates conditionally in 
case of HMS instance running initiator crash
                 Key: HIVE-29210
                 URL: https://issues.apache.org/jira/browse/HIVE-29210
             Project: Hive
          Issue Type: Bug
            Reporter: tanishqchugh
            Assignee: tanishqchugh


In a case, with multiple HiveServer2 (HS2) instances, one of the HS2 instances 
may run on the same host as the Hive Metastore (HMS). In this setup, the 
initiator runs within HMS, while the compaction worker threads run within HS2.

If the HMS instance unexpectedly crashes, the method revokeFromLocalWorkers() 
is invoked. This method resets all compaction jobs back to the initiated state, 
provided they were running on the same host. We believe this behavior is by 
design: if both HMS and HS2(running workers) were to crash simultaneously, and 
jobs were not reset, those compactions could remain stalled until 
revokeTimedoutWorkers() eventually reclaims them.

However, in the case where HMS crashes but the HS2 instance survives, the reset 
still occurs. As a result, the job is made available for reassignment even 
though the original HS2 worker is still actively processing it. This can lead 
to a scenario where another HS2 worker picks up the same compaction task, 
causing two workers to run the same minor compaction job concurrently.

This race condition can intermittently result in duplicate records being 
written to the table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to