ankitsultana commented on issue #10552:
URL: https://github.com/apache/pinot/issues/10552#issuecomment-1512529716
@Jackie-Jiang : We saw this again today for one of our clusters. I saw this:
1. In the logs there was a message for the 235th segment of the a Kafka
partition:
```
2023-04-18 06:31:11.871 [my_table__224__236__20230418T0520Z] INFO
o.a.pinot.segment.local.utils.tablestate.TableStateUtils - Found 1 unloaded
segments: [my_table__64__235__20230417T2021Z] for table: my_table_REALTIME
```
2. In the thread-dump I saw that the consumeLoop had started consuming for
the 236th segment already and was waiting on all segments to be loaded:
```
"my_table_v3__64__236__20230418T0511Z" #1992 daemon prio=5 os_prio=0
cpu=1911.79ms elapsed=4383.52s tid=0x00007efb10016800 nid=0x967 waiting on
condition [0x00007ee0714fa000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep([email protected]/Native Method)
at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:406)
at
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:647)
at java.lang.Thread.run([email protected]/Thread.java:829)
```
This seems like a deadlock. We can't even reset segments to get out of this
since it gets stuck in acquiring the `_partitionGroupConsumerSemaphore`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]