[
https://issues.apache.org/jira/browse/CASSANDRA-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521107#comment-14521107
]
Robert Stupp commented on CASSANDRA-7029:
-----------------------------------------
bq. Intel DPDK
Very illustrating: Figures 12 +22. They nicely show the effect (or absence) of
CPU affinity (by far the best is IRQ + application on different cores same
socket). But even selecting the "best” socket is very hardware dependent
(Figure 13). Link to OSS BSD project http://dpdk.org/
(I neither argue for or against DPDK - it's really nice, but only available for
Linux + FreeBSD).
> Investigate alternative transport protocols for both client and inter-server
> communications
> -------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-7029
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7029
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Labels: performance
> Fix For: 3.x
>
>
> There are a number of reasons to think we can do better than TCP for our
> communications:
> 1) We can actually tolerate sporadic small message losses, so guaranteed
> delivery isn't essential (although for larger messages it probably is)
> 2) As shown in \[1\] and \[2\], Linux can behave quite suboptimally with
> regard to TCP message delivery when the system is under load. Judging from
> the theoretical description, this is likely to apply even when the
> system-load is not high, but the number of processes to schedule is high.
> Cassandra generally has a lot of threads to schedule, so this is quite
> pertinent for us. UDP performs substantially better here.
> 3) Even when the system is not under load, UDP has a lower CPU burden, and
> that burden is constant regardless of the number of connections it processes.
> 4) On a simple benchmark on my local PC, using non-blocking IO for UDP and
> busy spinning on IO I can actually push 20-40% more throughput through
> loopback (where TCP should be optimal, as no latency), even for very small
> messages. Since we can see networking taking multiple CPUs' worth of time
> during a stress test, using a busy-spin for ~100micros after last message
> receipt is almost certainly acceptable, especially as we can (ultimately)
> process inter-server and client communications on the same thread/socket in
> this model.
> 5) We can optimise the threading model heavily: since we generally process
> very small messages (200 bytes not at all implausible), the thread signalling
> costs on the processing thread can actually dramatically impede throughput.
> In general it costs ~10micros to signal (and passing the message to another
> thread for processing in the current model requires signalling). For 200-byte
> messages this caps our throughput at 20MB/s.
> I propose to knock up a highly naive UDP-based connection protocol with
> super-trivial congestion control over the course of a few days, with the only
> initial goal being maximum possible performance (not fairness, reliability,
> or anything else), and trial it in Netty (possibly making some changes to
> Netty to mitigate thread signalling costs). The reason for knocking up our
> own here is to get a ceiling on what the absolute limit of potential for this
> approach is. Assuming this pans out with performance gains in C* proper, we
> then look to contributing to/forking the udt-java project and see how easy it
> is to bring performance in line with what we can get with our naive approach
> (I don't suggest starting here, as the project is using blocking old-IO, and
> modifying it with latency in mind may be challenging, and we won't know for
> sure what the best case scenario is).
> \[1\]
> http://test-docdb.fnal.gov/0016/001648/002/Potential%20Performance%20Bottleneck%20in%20Linux%20TCP.PDF
> \[2\]
> http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=1968;filename=Performance%20Analysis%20of%20Linux%20Networking%20-%20Packet%20Receiving%20(Official).pdf;version=2
> Further related reading:
> http://public.dhe.ibm.com/software/commerce/doc/mft/cdunix/41/UDTWhitepaper.pdf
> https://mospace.umsystem.edu/xmlui/bitstream/handle/10355/14482/ChoiUndPerTcp.pdf?sequence=1
> https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.3762&rep=rep1&type=pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)