Hello, With the release of 1.0 we've started moving towards a new cluster of GL hosts. These are working very well, with one exception. For some reason any reasonably significant UDP traffic will choke the message processor, fill up and process buffers on all four hosts, and effectively choke up all other message processing as well. Normally we do around 2k messages per second, split roughly 50/50 between TCP and UDP. Sending the entire TCP load to one host doesn't present a problem, it doesn't break a sweat.
I've also experimented a little with sending a large text file using rsyslog's imfile module, sending it via TCP will bottleneck us at the ES side of things and cause the disk journal fill up fairly rapidly, but it's still working at at ~9k messages per second so that's fine. Sending it via UDP just causes GL to choke again, fill up the journal to a certain point and slowly slowly process the journal at little bursts of a few thousand messages followed by several seconds of apparent sleeping(i.e pretty much no CPU usage). During all of this the input buffer never fills up more than at most single digit percentages, using TCP the output buffer sometimes moves up to 20-30%, with UDP it never moves at all. It's all in the process buffer. Sending a large burst of messages and then stopping doesn't seem to affect this behavior either, even after the inbound messages stop it still takes a long time to process the messages that are already in the journal and process buffer. I'm using VisualVM to look at the CPU and memory usage, this is a screenshot of a UDP session: http://i59.tinypic.com/x23xfl.png I've tried mucking around with various knobs, processbuffer_processors, JVM settings, etc, with no results whatsoever, good or bad. There's nothing to suggest a problem in neither the graylog nor system logs. Pertinent specs and settings: ring_size = 16384 (CPU's have 20 MB L3) processbuffer_processors = 5 Java 8u31 Using G1GC with StringDeduplication, I've tried without the latter and just using CMC as well, no difference. 4 GB Xmx/Xms. Linux 3.16.0 net.core.rmem_max = 8388608 These are virtual machines, VMware, 8 GB / 8 vCPU's, Xeon E5-2690's. Software wise the old nodes are running the same setup more or less, except kernel 3.2.0, same JVM, G1GC, etc. Hardware wise, they're physical boxes, old Dell 2950's with dual quad core E5440's. That's Core2 era so quite a bit slower. Any ideas? -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
