ege-st commented on code in PR #11496:
URL: https://github.com/apache/pinot/pull/11496#discussion_r1319899550


##########
pinot-core/src/main/java/org/apache/pinot/core/transport/DataTableHandler.java:
##########
@@ -83,5 +98,12 @@ protected void channelRead0(ChannelHandlerContext ctx, 
ByteBuf msg) {
   public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
     LOGGER.error("Caught exception while handling response from server: {}", 
_serverRoutingInstance, cause);
     
_brokerMetrics.addMeteredGlobalValue(BrokerMeter.RESPONSE_FETCH_EXCEPTIONS, 1);
+    if (cause instanceof java.lang.OutOfMemoryError) {

Review Comment:
   @gortiz 
   
   Re broker sending a max size: I don't have strong opinions on implementation 
details, but once we get a bandaid for the Direct OOM issue then it'd be 
awesome to dive into discussing ways we might do proactive resource management.
   
   Re the Netty config option: I will test that proposal but I'm not sure that 
it will prevent Netty from loading the entire message into Direct Memory. We'll 
see with testing.
   
   One thing is we don't have any message length field in our payload format 
(which is unfortunate) which seems to be what this feature is built around.  I 
think the implication of that is this might work to provide the "max buffer 
size" so that messages which exceed that buffer get dropped, but we since we 
don't know how many bytes a message should have we can't reconstruct a message 
from multiple buffers. Put another way: without changing our message format, 
what we can do is set a max buffer size that won't cause an OOM and attempt to 
deserialize the message and if it fails then drop the message and return an 
error to the application layer. I think, that without having a length field in 
the message we can't reconstruct the message because we won't know when to stop 
appending byte buffers to our aggregate.
   
   However, even if it works and we use it as the fix, I still think we need 
explicit handling of DM OOMs in Netty, there will always be some confluence of 
events that will make it happen and making sure we handle it gracefully is, 
imo, a base line need.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to