[GitHub] [pulsar] hepyu opened a new issue #12533: bookie entry的p999写入时间达到5秒可能是什么原因？

GitBox Fri, 29 Oct 2021 01:35:50 -0700


hepyu opened a new issue #12533:
URL: https://github.com/apache/pulsar/issues/12533



   前提：3台8c16g的pulsar集群
   
   现象&问题：一个节点的entry的写入P999达到5秒，且持续至少30秒。而其他两个节点的写入是正常的，有可能是什么原因呢？
   
   
![image](https://user-images.githubusercontent.com/3917749/139403345-40bae858-fef9-4a8e-a662-57f11d47996f.png)
   
   是bookie发生了不同节点间的数据迁移？还是说在这一区间把jounal日志大量向多个ledger中写入？
   
   其他数据：
   写入tps：3500~4000
   读取tps：1.5w~2w
   
   这个30秒～60秒区间的异常情况：
   1130次发送失败中，有912次是网络超时，218次是因为pulsar发送队列满
   再细查日志可以看到，912次网络超时在前，218次pulsar-queue满造成超时在后，可以说明是一个前因后果的关系。
   
![image](https://user-images.githubusercontent.com/3917749/139403434-fd1cb54e-b8ed-44bf-a473-100562eb54c3.png)
   
   这个异常节点在这个异常区间的bookie日志：
   
![image](https://user-images.githubusercontent.com/3917749/139403491-4205044e-8d76-4dd7-8dc8-aa16daa84363.png)
   
   貌似从journal到写入ledger的这个时间太长了吧？
   和这个有关系么？这是在做从journal（预写日志）到持久化日志（ledger）的操作
   
   同一时间的journal写入时间也是5秒
   
![image](https://user-images.githubusercontent.com/3917749/139403582-a18122de-9317-42ce-93ce-e2b0f81a6674.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] hepyu opened a new issue #12533: bookie entry的p999写入时间达到5秒可能是什么原因？

Reply via email to