zymap commented on issue #3751: URL: https://github.com/apache/bookkeeper/issues/3751#issuecomment-1427182286
I and @hangc0276 take a deep look into this issue. We found the root cause is that Netty casts the message to ReferenceCounted. When the bookkeeper gets the ReadResponse, we can not parse it correctly [at here](https://github.com/apache/bookkeeper/blob/591ceb971a7eda7ce5252f88c5cf694f3f4c30b2/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java#L1344) and then lose the response for read. In Netty, when we fireChannelRead, it passes the message as an Object to the next channel. ``` @Override public ChannelHandlerContext fireChannelRead(final Object msg) { invokeChannelRead(findContextInbound(MASK_CHANNEL_READ), msg); return this; } static void invokeChannelRead(final AbstractChannelHandlerContext next, Object msg) { final Object m = next.pipeline.touch(ObjectUtil.checkNotNull(msg, "msg"), next); EventExecutor executor = next.executor(); if (executor.inEventLoop()) { next.invokeChannelRead(m); } else { executor.execute(new Runnable() { @Override public void run() { next.invokeChannelRead(m); } }); } } ``` When the message comes out from the ResponseEnDeCoderPreV3 [decoding](https://github.com/apache/bookkeeper/blob/591ceb971a7eda7ce5252f88c5cf694f3f4c30b2/bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/BookieProtoEncoding.java#L577), it should be a ReadResponse. As we show upon, Netty passes it as an Object, then it will go to `final Object m = next.pipeline.touch(ObjectUtil.checkNotNull(msg, "msg"), next);`. DefaultChannelPipeline.touch: ``` final Object touch(Object msg, AbstractChannelHandlerContext next) { return touch ? ReferenceCountUtil.touch(msg, next) : msg; } ``` ReferenceCountUtil.touch(msg, next) ``` public static <T> T touch(T msg, Object hint) { if (msg instanceof ReferenceCounted) { return (T) ((ReferenceCounted) msg).touch(hint); } return msg; } ``` When it goes to `ReferenceCountUtil.touch(msg, next)`, the change https://github.com/apache/bookkeeper/pull/3528 will make it differently. Before the change, the ReadResponse is not a ReferenceCounted, it will return directly. The message type still is a ReadResponse. But since we change the ReadResponse to a ReferenceCounted, it will use the [touch method](https://github.com/apache/bookkeeper/pull/3528/files#diff-5bbab7efe840cf7924788317f180e2e6bc99c0e9dabfccc1dea5d5b26ed413ebR483) of ReadResponse. Then the type changed to ByteBuf. I will push a fix for that. And If you have time, please help to verify it! @MMirelli @eolivelli -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
