[ 
https://issues.apache.org/jira/browse/KAFKA-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal updated KAFKA-17533:
--------------------------
    Description: 
After upgrading kafka.version from 3.3.2 to 3.6.1 we observed following issue: 
within our service we have a thread pool with 32 threads and eventually all 
these threads got blocked on Kafka Steams code. Stack trace:
{code:java}
   org.apache.kafka.streams.state.internals.RocksDBStore.get line: 397 
   org.apache.kafka.streams.state.internals.RocksDBStore.get line: 84 
   org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$get$5 
line: 319 
   
org.apache.kafka.streams.state.internals.MeteredKeyValueStore$$Lambda$1454/0x0000000100970440.get
 line: not available 
   
org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency
 line: 887 
   org.apache.kafka.streams.state.internals.MeteredKeyValueStore.get line: 319 
   org.apache.kafka.streams.state.internals.ReadOnlyKeyValueStoreFacade.get 
line: 35 
   org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get 
line: 56  {code}
The issue doesn't happen immediately i.e. the service can work fine for 
hours/days but after some time after one thread gets blocked eventually all 
threads from the thread pool get blocked on the same line of Kafka Streams code.

Please see attached for a view when all thread pool threads got blocked: it's 
taken using Azul Mission Control and all threads are shown when we selected 
"Deadlock detection" checkbox: this seems to suggest there's a deadlock within 
Kafka Streams code.

After rolling back version to 3.3.2 the issue went away.

I'm wondering whether this is a known issue and if yes then whether it was 
fixed in any version after 3.6.1

 

  was:
After upgrading kafka.version from 3.3.2 to 3.6.1 we observed following issue: 
within our service we have a thread pool with 32 threads and eventually all 
these threads got blocked on Kafka Steams code. Stack trace:
{code:java}
   org.apache.kafka.streams.state.internals.RocksDBStore.get line: 397 
   org.apache.kafka.streams.state.internals.RocksDBStore.get line: 84 
   org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$get$5 
line: 319 
   
org.apache.kafka.streams.state.internals.MeteredKeyValueStore$$Lambda$1454/0x0000000100970440.get
 line: not available 
   
org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency
 line: 887 
   org.apache.kafka.streams.state.internals.MeteredKeyValueStore.get line: 319 
   org.apache.kafka.streams.state.internals.ReadOnlyKeyValueStoreFacade.get 
line: 35 
   org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get 
line: 56  {code}
The issue doesn't happen immediately i.e. the service can work fine for 
hours/days but after some time after one thread gets blocked eventually all 
threads from the thread pool get blocked on the same line of Kafka Streams code.

Please see attached for a view when all thread pool threads got blocked.

After rolling back version to 3.3.2 the issue went away.

I'm wondering whether this is a known issue and if yes then whether it was 
fixed in any version after 3.6.1

 


> Threads get blocked on Kafka Streams RocksDBStore.get() API
> -----------------------------------------------------------
>
>                 Key: KAFKA-17533
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17533
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.6.1
>            Reporter: Rafal
>            Priority: Major
>         Attachments: all_threads_blocked.png
>
>
> After upgrading kafka.version from 3.3.2 to 3.6.1 we observed following 
> issue: within our service we have a thread pool with 32 threads and 
> eventually all these threads got blocked on Kafka Steams code. Stack trace:
> {code:java}
>    org.apache.kafka.streams.state.internals.RocksDBStore.get line: 397 
>    org.apache.kafka.streams.state.internals.RocksDBStore.get line: 84 
>    org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$get$5 
> line: 319 
>    
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore$$Lambda$1454/0x0000000100970440.get
>  line: not available 
>    
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency
>  line: 887 
>    org.apache.kafka.streams.state.internals.MeteredKeyValueStore.get line: 
> 319 
>    org.apache.kafka.streams.state.internals.ReadOnlyKeyValueStoreFacade.get 
> line: 35 
>    
> org.apache.kafka.streams.state.internals.CompositeReadOnlyKeyValueStore.get 
> line: 56  {code}
> The issue doesn't happen immediately i.e. the service can work fine for 
> hours/days but after some time after one thread gets blocked eventually all 
> threads from the thread pool get blocked on the same line of Kafka Streams 
> code.
> Please see attached for a view when all thread pool threads got blocked: it's 
> taken using Azul Mission Control and all threads are shown when we selected 
> "Deadlock detection" checkbox: this seems to suggest there's a deadlock 
> within Kafka Streams code.
> After rolling back version to 3.3.2 the issue went away.
> I'm wondering whether this is a known issue and if yes then whether it was 
> fixed in any version after 3.6.1
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to