jmint-stripe opened a new issue #7874:
URL: https://github.com/apache/pinot/issues/7874


   We observed some realtime ingestion lag on one of our Pinot clusters. After 
some investigation we determined that the lag was happening on a subset of the 
partitions for the Kafka stream we were ingesting from.
   
   Analyzing the logs showed that this was caused by a temporary ZooKeeper 
connection issue that caused a cascade of `InterupptedException` and this 
caused some segment state transitions from `OFFLINE` to `CONSUMING` to fail. 
   
   Some relevant log messages:
   
   ```
   2021/11/30 01:55:15.334 WARN [ZKHelixManager] 
[HelixTaskExecutor-message_handle_STATE_TRANSITION] zkClient to [redacted] is 
not connected, wait for 10000ms.
   ```
   
   ```
   Exception while executing a state transition task [redacted segment name]
       ...
       Caused by: java.lang.RuntimeException: InterruptedException when 
acquiring the partitionConsumerSemaphore for segment: [redacted segment name]
   ```
   
   ```
   2021/11/30 01:55:15.334 ERROR [HelixTask] 
[HelixTaskExecutor-message_handle_STATE_TRANSITION] Exception after executing a 
message, msgId: 
76da755d-4ae3-4d61-84e6-11a946f6bffcorg.I0Itec.zkclient.exception.ZkInterruptedException:
 java.lang.InterruptedException
       org.I0Itec.zkclient.exception.ZkInterruptedException: 
java.lang.InterruptedException
               at 
org.apache.helix.manager.zk.zookeeper.ZkClient.acquireEventLock(ZkClient.java:1142)
       ...
   ```
   
   The end result was that consumption stopped for the partitions represented 
by these segments that had failed state transitions.
   
   In order to get the servers to start consuming for those partitions again we 
had to restart the servers hosting those segments. The expectation is that 
Pinot should be able to eventually recover and start consuming again once the 
ZooKeeper connection is available again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to