dlmarion commented on PR #5550:
URL: https://github.com/apache/accumulo/pull/5550#issuecomment-2898221030

   > Some observations:
   > 
   > It appears that based on the referenced Thrift code, if the incoming frame 
is less than the max frame size, but reading the frame would use more memory 
than the total, it returns `true` (for the connection to remain alive) and does 
**not** read the frame off the network stack.
   > 
   > The tcp stack has configuration knobs for setting the buffer size on the 
sender and receiver side, and for how much data can be in flight.
   > 
   > I think the other question here is what happens on the Thrift sender side 
when the total max memory is reached on the receiver side. Does it just sit 
there and wait indefinitely? If the sender waits forever and does not get a 
timeout or other exception, then maybe there is nothing to do here.
   
   I added an 
[IT](https://github.com/dlmarion/accumulo/commit/78eefc22c43bb213b1fafd263f123f4164fa883b)
 to test what happens in this case. The IT starts a Coordinator derivative that 
sleeps when `updateCompactionStatus` is called. `updateCompactionStatus` takes 
a String as an argument, so I set the `RPC_MAX_MESSAGE_SIZE` to 3MB and then 
created 4 Thrift clients that each called `updateCompactionStatus` passing a 
String that contains 1 million characters.
   
   The first three Thrift client calls is less than the RPC_MAX_MESSAGE_SIZE, 
so they are allowed into the Coordinator. The fourth call crosses the total 
message size boundary, so it sits there and waits. You can see the 4 tcp 
connections from the test to the Coordinator, with 3 of them having no bytes in 
the receive and send queue, and 1 of them having almost 1 million bytes on the 
receive queue. You can also see in the coordinator log that 3 msgs are received.
   
   The IT will sit there in this state for 10 minutes, the default timeout. 
This is because the `GENERAL_RPC_TIMEOUT` is set to zero, which means no 
timeout. If `GENERAL_RPC_TIMEOUT` is set, then the Thrift clients get a timeout 
exception and fail the test. In the `Compactor`, this would likely end up with 
the call being retried due to the `RetryableThriftCall`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to