dlmarion commented on PR #5550: URL: https://github.com/apache/accumulo/pull/5550#issuecomment-2898221030
> Some observations: > > It appears that based on the referenced Thrift code, if the incoming frame is less than the max frame size, but reading the frame would use more memory than the total, it returns `true` (for the connection to remain alive) and does **not** read the frame off the network stack. > > The tcp stack has configuration knobs for setting the buffer size on the sender and receiver side, and for how much data can be in flight. > > I think the other question here is what happens on the Thrift sender side when the total max memory is reached on the receiver side. Does it just sit there and wait indefinitely? If the sender waits forever and does not get a timeout or other exception, then maybe there is nothing to do here. I added an [IT](https://github.com/dlmarion/accumulo/commit/78eefc22c43bb213b1fafd263f123f4164fa883b) to test what happens in this case. The IT starts a Coordinator derivative that sleeps when `updateCompactionStatus` is called. `updateCompactionStatus` takes a String as an argument, so I set the `RPC_MAX_MESSAGE_SIZE` to 3MB and then created 4 Thrift clients that each called `updateCompactionStatus` passing a String that contains 1 million characters. The first three Thrift client calls is less than the RPC_MAX_MESSAGE_SIZE, so they are allowed into the Coordinator. The fourth call crosses the total message size boundary, so it sits there and waits. You can see the 4 tcp connections from the test to the Coordinator, with 3 of them having no bytes in the receive and send queue, and 1 of them having almost 1 million bytes on the receive queue. You can also see in the coordinator log that 3 msgs are received. The IT will sit there in this state for 10 minutes, the default timeout. This is because the `GENERAL_RPC_TIMEOUT` is set to zero, which means no timeout. If `GENERAL_RPC_TIMEOUT` is set, then the Thrift clients get a timeout exception and fail the test. In the `Compactor`, this would likely end up with the call being retried due to the `RetryableThriftCall`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org