Evgheni Popusoi created KAFKA-19629: ---------------------------------------
Summary: Deadlock in Kafka Streams when processing Interactive Queries and state store updates concurrently Key: KAFKA-19629 URL: https://issues.apache.org/jira/browse/KAFKA-19629 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 3.9.1, 3.8.1 Environment: Kafka Streams, kotlin, linux, docker. JDK 21 Reporter: Evgheni Popusoi Attachments: thread-dump-1.txt, thread-dump-2.txt We are using a Kafka Streams topology that continuously writes large volumes of data into a RocksDB state store with stable throughput. In parallel, another thread executes Interactive Query (IQ) requests against the same local state store. When the number of IQ requests in the queue grows (≈50+), the application enters a {*}deadlock state{*}. *Investigation:* Using a thread dump, we discovered a lock inversion between RocksDB operations: * {{RocksDBStore.put}} ** blocked on {{org.apache.kafka.streams.query.Position@4ba00b6c}} ** holding {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}} * {{RocksDBStore.range}} ** blocked on {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}} ** holding {{org.apache.kafka.streams.query.Position@4ba00b6c}} This indicates that {*}{{put}} and {{range}} acquire the same locks but in different order{*}, which leads to deadlock under concurrent load. *Expected Behavior:* Kafka Streams API should guarantee deadlock-free operation. Store writes ({{{}put{}}}) and IQ reads ({{{}range{}}}) should not block each other in a way that leads to lock inversion. *Steps to Reproduce:* # Create a Kafka Streams topology with a RocksDB state store receiving continuous writes. # In a parallel thread, issue a high number of Interactive Query {{range}} requests (≈50+ queued). # Observe that the system eventually enters deadlock. * *Impact:* * Application stops processing data. * Interactive Queries fail indefinitely. * Requires manual restart to recover. *Notes:* * Appears to be a lock ordering bug in {{{}RocksDBStore{}}}. * Expected the Streams API to coordinate thread-safety and prevent such deadlocks. -- This message was sent by Atlassian Jira (v8.20.10#820010)