ivakegg commented on issue #3334: URL: https://github.com/apache/accumulo/issues/3334#issuecomment-1790897950
So we had another situation with #3909 that could be considered partially because of this ticket. In that case it was not only the monitor that could not handle the truth, but the tservers as well. The monitor fell over, and the node became unaccessible because of the nober if IRQ requests hitting it from the tservers. Many tservers fall over because of out of memory and many others had various threads die (out of memory) but remained running. The result in the end was the entire system had to be brought down hard. So, probably the best thing that can be done for this ticket is not to modify the monitor, but rather focus on the number of messages being sent by the tservers. Even if there is a loop that is spamming messages (on the order of 1 per millisecond in that case), we should not be trying to send every single message to the monitor. There needs to be some dedupping of messages done on the tserver side if possible and perhaps the queue size of messages being sent to the logger needs to be restricted in size. I don't know if we can do this given we are using the standard logging infrastructure such as log4j but it should be investigated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
