wenbingshen opened a new issue #2941:
URL: https://github.com/apache/bookkeeper/issues/2941


   **BUG REPORT**
   
   ***Describe the bug***
   
   Our compaction current limiting configuration is as follows, the traffic 
configuration is very small.
   **bookkeeper.conf**
   `isThrottleByBytes=true`
   `compactionRateByEntries=1000`
   `compactionRateByBytes=150000`
   
![image](https://user-images.githubusercontent.com/35599757/146189635-da561e67-0173-4e64-a668-3d4282245076.png)
   
   Our bookkeeper cluster is always running stably, but when we need to restart 
bookie, it took us half an hour to stop a bookie node.
   You can observe our log as follows. After log investigation, we found that 
the same major compaction task in a certain GC thread of Bookie has been 
running for two days and has not been completed. The remove operation between 
each entry log is separated by 20 to 30 minutes.
   
![image](https://user-images.githubusercontent.com/35599757/146188911-c9bbe546-b3bc-4a98-9cf1-282c5a8b0d5d.png)
   
   When we run the command line tool to stop bookie, 
`component-shutdown-thread` will start to work. But the 
`component-shutdown-thread` is always blocked in the shutdown logic of the GC 
thread.
   
![image](https://user-images.githubusercontent.com/35599757/146190762-afd73921-ad8f-4661-8369-7b24a6150ab6.png)
   
   The `GarbageCollectorThread` thread has been blocked in `RateLimiter.acquire`
   
![image](https://user-images.githubusercontent.com/35599757/146191195-dee9d53c-b864-4ad1-9698-1d26538f2cd9.png)
   
   We stopped bookie at 14:10, but bookie didn't really stop until 14:42. The 
last stop time depends on the last time of all GC threads to recover from the 
current limit.
   
   **`GarbageCollectorThread-15-1`**
   Stopped at 14:42:48
   
![image](https://user-images.githubusercontent.com/35599757/146191864-bac9fdb3-2e55-4c3d-b4bc-f3f8f3f5bd33.png)
   
   **`GarbageCollectorThread-21-1`**
   Stopped at 14:38:15
   
![image](https://user-images.githubusercontent.com/35599757/146192021-02c7f3d3-d77c-4a1a-ac84-4da7bd9ac566.png)
   
   **`GarbageCollectorThread-24-1`**
   Stopped at 14:22:04
   
![image](https://user-images.githubusercontent.com/35599757/146192130-64b96fb0-96ef-45bd-8529-cd7e12c7cdb4.png)
   
   **`GarbageCollectorThread-27-1`**
   Stopped at 14:36:53
   
![image](https://user-images.githubusercontent.com/35599757/146192222-11f90e40-f850-42ce-b06a-b5deed7934f4.png)
   
   
   ***To Reproduce***
   
   Steps to reproduce the behavior:
   
   Use our same compaction current limiting configuration and continue to write 
30MB/s of traffic. After running for a few days, trigger the TTL of the pulsar 
broker, and then try to shut down your bookie node.
   
   ***Expected behavior***
   If the compactor is limited, the shutdown priority should be higher than 
waiting for `RateLimiter.acquire`.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to