ivakegg commented on issue #3334:
URL: https://github.com/apache/accumulo/issues/3334#issuecomment-1790897950

   So we had another situation with #3909 that could be considered partially 
because of this ticket.  In that case it was not only the monitor that could 
not handle the truth, but the tservers as well.  The monitor fell over, and the 
node became unaccessible because of the nober if IRQ requests hitting it from 
the tservers.  Many tservers fall over because of out of memory and many others 
had various threads die (out of memory) but remained running.  The result in 
the end was the entire system had to be brought down hard.
   
   So, probably the best thing that can be done for this ticket is not to 
modify the monitor, but rather focus on the number of messages being sent by 
the tservers.  Even if there is a loop that is spamming messages (on the order 
of 1 per millisecond in that case), we should not be trying to send every 
single message to the monitor.  There needs to be some dedupping of messages 
done on the tserver side if possible and perhaps the queue size of messages 
being sent to the logger needs to be restricted in size.  I don't know if we 
can do this given we are using the standard logging infrastructure such as 
log4j but it should be investigated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to