horizonzy opened a new issue, #3527:
URL: https://github.com/apache/bookkeeper/issues/3527

   **BUG REPORT**
   Recently, in our customer product, we found that the bookie direct memory 
increases slowly over time, finally it throws netty allocate direct memory 
failed exception, then bookie restart.
   
   **Dump info**
   So we dump the heap and found that there are some unreachable 
`ReadResponse`, and the data property `ByteBuf` refCnt is 2, and the  `ByteBuf` 
is also unreachable, so there is no chance to release the `ByteBuf`, it will be 
deleted after JVM GC.
   <img width="1563" alt="image" 
src="https://user-images.githubusercontent.com/22524871/195585954-608d8da5-de32-480e-91a5-d5b535eac6de.png";>
   
   See the picture, we can think the memory leak already happen. 
   
   **Log info**
   In the bookie log, we found there are many disconnect info like:
   ```
   "Oct 12, 2022 @ 07:47:28.673","","2022-10-12T07:47:28,672+0000 
[bookie-io-1-6] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - 
Channels disconnected: [id: 0x111df957, L:/xx.xx.xx.xx:3181 ! 
R:/xx.xx.xx.xx:44396]"
   ```
   
   
   **The Root Case**
   The memory leak occurred in an edge scene.
   In the normal case, we get the data from BookieImpl and wrap it in 
ReadResponse, and then write the 
[ReadResponse](https://github.com/apache/bookkeeper/blob/62155369d199f36720938e913541b223dab8b047/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtocol.java#L448)
 to the netty channel, there is our netty handler to handle the ReadResponse. 
   * 
[ResponseEncoder](https://github.com/apache/bookkeeper/blob/62155369d199f36720938e913541b223dab8b047/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L458)
 It transfer the ReadResponse to ByteBufList, see 
[encode](https://github.com/apache/bookkeeper/blob/62155369d199f36720938e913541b223dab8b047/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L251)
   
   The 
[ByteBufList](https://github.com/apache/bookkeeper/blob/62155369d199f36720938e913541b223dab8b047/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufList.java#L63)
 is implemented interce `ReferenceCounted`, so after netty flush it, netty will 
release it by invoke 
`io.netty.util.ReferenceCountUtil#release(java.lang.Object)`
   
   ```
       public static boolean release(Object msg, int decrement) {
           ObjectUtil.checkPositive(decrement, "decrement");
           if (msg instanceof ReferenceCounted) {
               return ((ReferenceCounted) msg).release(decrement);
           }
           return false;
       }
   ```
   
   In the `io.netty.util.ReferenceCountUtil#release(java.lang.Object)`, it will 
check if the msg is implemented ReferenceCounted. If not, do nothing.
   
   There is an important netty mechanism you should know, after a connection 
disconnect, there are two things that happen.
   1. all the handlers will be removed from the ChannelPipeline after 
channelUnregistered.
   
https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/DefaultChannelPipeline.java#L1387-L1394
   At line_1392, it will remove all the handlers, only leaving two netty 
default contextHandler in the ChannelPipeline, 
[HeadContext](https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/DefaultChannelPipeline.java#L1305)
 and 
[TailContext](https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/DefaultChannelPipeline.java#L1245)
   
   2. Make AbstractChannel.AbstractUnsafe#outboundBuffer = null.
   
https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/AbstractChannel.java#L703
   
   
   So we know, after the connection disconnect, the channel only left 
HeadContext and TailContext, and the 
AbstractChannel.AbstractUnsafe#outboundBuffer is null. If the bookie still 
write ReadResponse to netty channel, the `ResponseEncoder` didn't work(already 
removed), the HeadContext will write ReadResponse to 
AbstractUnsafe#outboundBuffer directly.
   
https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/AbstractChannel.java#L847-L864
   
   see line_851, the outboundBuffer is null, it will release ReadResponse, but 
the ReadResponse didn't implement ReferenceCounted, so it didn't release the 
ReadResponse.data. **Memory leak**!!!
   
   
   ***To Reproduce***
   
   Steps to reproduce the behavior:
   1. Start bookies.
   2. Start a bookkeeper client using V2 protocol.
   3. use the bookkeeper client to read entries.
   4. In the reading process, kill -9 bookkeeper client programmer.
   5. Bookie server will memory leak.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to