C. Scott Andreas updated CASSANDRA-7029:
    Component/s: Streaming and Messaging

> Investigate alternative transport protocols for both client and inter-server 
> communications
> -------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-7029
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7029
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL, Streaming and Messaging
>            Reporter: Benedict
>            Priority: Major
>              Labels: performance
>             Fix For: 4.x
> There are a number of reasons to think we can do better than TCP for our 
> communications:
> 1) We can actually tolerate sporadic small message losses, so guaranteed 
> delivery isn't essential (although for larger messages it probably is)
> 2) As shown in \[1\] and \[2\], Linux can behave quite suboptimally with 
> regard to TCP message delivery when the system is under load. Judging from 
> the theoretical description, this is likely to apply even when the 
> system-load is not high, but the number of processes to schedule is high. 
> Cassandra generally has a lot of threads to schedule, so this is quite 
> pertinent for us. UDP performs substantially better here.
> 3) Even when the system is not under load, UDP has a lower CPU burden, and 
> that burden is constant regardless of the number of connections it processes. 
> 4) On a simple benchmark on my local PC, using non-blocking IO for UDP and 
> busy spinning on IO I can actually push 20-40% more throughput through 
> loopback (where TCP should be optimal, as no latency), even for very small 
> messages. Since we can see networking taking multiple CPUs' worth of time 
> during a stress test, using a busy-spin for ~100micros after last message 
> receipt is almost certainly acceptable, especially as we can (ultimately) 
> process inter-server and client communications on the same thread/socket in 
> this model.
> 5) We can optimise the threading model heavily: since we generally process 
> very small messages (200 bytes not at all implausible), the thread signalling 
> costs on the processing thread can actually dramatically impede throughput. 
> In general it costs ~10micros to signal (and passing the message to another 
> thread for processing in the current model requires signalling). For 200-byte 
> messages this caps our throughput at 20MB/s.
> I propose to knock up a highly naive UDP-based connection protocol with 
> super-trivial congestion control over the course of a few days, with the only 
> initial goal being maximum possible performance (not fairness, reliability, 
> or anything else), and trial it in Netty (possibly making some changes to 
> Netty to mitigate thread signalling costs). The reason for knocking up our 
> own here is to get a ceiling on what the absolute limit of potential for this 
> approach is. Assuming this pans out with performance gains in C* proper, we 
> then look to contributing to/forking the udt-java project and see how easy it 
> is to bring performance in line with what we can get with our naive approach 
> (I don't suggest starting here, as the project is using blocking old-IO, and 
> modifying it with latency in mind may be challenging, and we won't know for 
> sure what the best case scenario is).
> \[1\] 
> http://test-docdb.fnal.gov/0016/001648/002/Potential%20Performance%20Bottleneck%20in%20Linux%20TCP.PDF
> \[2\] 
> http://cd-docdb.fnal.gov/cgi-bin/RetrieveFile?docid=1968;filename=Performance%20Analysis%20of%20Linux%20Networking%20-%20Packet%20Receiving%20(Official).pdf;version=2
> Further related reading:
> http://public.dhe.ibm.com/software/commerce/doc/mft/cdunix/41/UDTWhitepaper.pdf
> https://mospace.umsystem.edu/xmlui/bitstream/handle/10355/14482/ChoiUndPerTcp.pdf?sequence=1
> https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Web_Platform/5/html/Administration_And_Configuration_Guide/jgroups-perf-udpbuffer.html
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to