I've been setting up mon to watch a group of servers. As part of this setup, I've got a perl log watching program watching the system logs and sending traps to mon when relevant lines appear. Everything seems to work beautifully until I try to send a lot of traps to mon quickly, at which point mon starts simply dropping them en masse. To verify this behavior, I used Mon::Client to write a simple perl script to send 100 traps to a mon service with a test.alert. In repeated runs, the resulting test.alert.log ended up with anywhere from 52 to 96 entries, meaning that mon dropped up to almost half of the traps. Single runs with 1,000 and 10,000 traps resulted in 120 and 495 entries in the mon log, respectively. I also tried running these tests using localhost as the logging host to rule out network problems and got similar results.
I realize that UDP is an unreliable protocol and that logging services generally use UDP to avoid hanging and missing something important happening, but the above behavior totally rules out using the mon trap stuff in any situation where missing an alert would be Bad Thing (which I would think would be almost any situation in which mon traps are used). Is there something I'm missing here ? Is there some way to verify that traps are really getting through the server ? -hal
