Could this simply be because TCP avoids (or tries to avoid) congestion while 
UDP does not?

/HJ

> On 24 Feb 2015, at 13:50, [email protected] wrote:
> 
> Hello,
> 
> With the release of 1.0 we've started moving towards a new cluster of GL 
> hosts. These are working very well, with one exception.
> For some reason any reasonably significant UDP traffic will choke the message 
> processor, fill up and process buffers on all four hosts, and effectively 
> choke up all other message processing as well.
> Normally we do around 2k messages per second, split roughly 50/50 between TCP 
> and UDP. Sending the entire TCP load to one host doesn't present a problem, 
> it doesn't break a sweat.
> 
> I've also experimented a little with sending a large text file using 
> rsyslog's imfile module, sending it via TCP will bottleneck us at the ES side 
> of things and cause the disk journal fill up fairly rapidly, but it's still 
> working at at ~9k messages per second so that's fine. Sending it via UDP just 
> causes GL to choke again, fill up the journal to a certain point and slowly 
> slowly process the journal at little bursts of a few thousand messages 
> followed by several seconds of apparent sleeping(i.e pretty much no CPU 
> usage).
> 
> During all of this the input buffer never fills up more than at most single 
> digit percentages, using TCP the output buffer sometimes moves up to 20-30%, 
> with UDP it never moves at all. It's all in the process buffer. Sending a 
> large burst of messages and then stopping doesn't seem to affect this 
> behavior either, even after the inbound messages stop it still takes a long 
> time to process the messages that are already in the journal and process 
> buffer.
> I'm using VisualVM to look at the CPU and memory usage, this is a screenshot 
> of a UDP session:
> http://i59.tinypic.com/x23xfl.png
> 
> I've tried mucking around with various knobs, processbuffer_processors, JVM 
> settings, etc, with no results whatsoever, good or bad.
> There's nothing to suggest a problem in neither the graylog nor system logs.
> 
> Pertinent specs and settings:
> ring_size = 16384 (CPU's have 20 MB L3)
> processbuffer_processors = 5
> 
> Java 8u31
> Using G1GC with StringDeduplication, I've tried without the latter and just 
> using CMC as well, no difference.
> 4 GB Xmx/Xms.
> Linux 3.16.0
> net.core.rmem_max = 8388608
> 
> These are virtual machines, VMware, 8 GB / 8 vCPU's, Xeon E5-2690's.
> 
> Software wise the old nodes are running the same setup more or less, except 
> kernel 3.2.0, same JVM, G1GC, etc. Hardware wise, they're physical boxes, old 
> Dell 2950's with dual quad core E5440's. That's Core2 era so quite a bit 
> slower.
> 
> Any ideas?
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "graylog2" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to