Evgheni Popusoi created KAFKA-19629:
---------------------------------------

             Summary: Deadlock in Kafka Streams when processing Interactive 
Queries and state store updates concurrently
                 Key: KAFKA-19629
                 URL: https://issues.apache.org/jira/browse/KAFKA-19629
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 3.9.1, 3.8.1
         Environment: Kafka Streams, kotlin, linux, docker. JDK 21
            Reporter: Evgheni Popusoi
         Attachments: thread-dump-1.txt, thread-dump-2.txt

We are using a Kafka Streams topology that continuously writes large volumes of 
data into a RocksDB state store with stable throughput. In parallel, another 
thread executes Interactive Query (IQ) requests against the same local state 
store.

When the number of IQ requests in the queue grows (≈50+), the application 
enters a {*}deadlock state{*}.

*Investigation:*
Using a thread dump, we discovered a lock inversion between RocksDB operations:
 * {{RocksDBStore.put}}

 ** blocked on {{org.apache.kafka.streams.query.Position@4ba00b6c}}

 ** holding {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}

 * {{RocksDBStore.range}}

 ** blocked on 
{{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}

 ** holding {{org.apache.kafka.streams.query.Position@4ba00b6c}}

This indicates that {*}{{put}} and {{range}} acquire the same locks but in 
different order{*}, which leads to deadlock under concurrent load.

*Expected Behavior:*
Kafka Streams API should guarantee deadlock-free operation. Store writes 
({{{}put{}}}) and IQ reads ({{{}range{}}}) should not block each other in a way 
that leads to lock inversion.

*Steps to Reproduce:*
 # Create a Kafka Streams topology with a RocksDB state store receiving 
continuous writes.

 # In a parallel thread, issue a high number of Interactive Query {{range}} 
requests (≈50+ queued).

 # Observe that the system eventually enters deadlock.

 *  

*Impact:*
 * Application stops processing data.

 * Interactive Queries fail indefinitely.

 * Requires manual restart to recover.

*Notes:*
 * Appears to be a lock ordering bug in {{{}RocksDBStore{}}}.

 * Expected the Streams API to coordinate thread-safety and prevent such 
deadlocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to