[ 
https://issues.apache.org/jira/browse/CASSANDRA-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517024#comment-14517024
 ] 

Benedict commented on CASSANDRA-9237:
-------------------------------------

[~jbellis]: i don't feel strongly about this, since it is a borderline case, 
but it seems to me that truncating the outbound queue and waiting for another 
stimulus to restart message delivery might make matters worse, by 
underutilizing an already insufficient connection. The reconnection itself 
isn't such a problem, but the message and hint delivery will be choppy. Really 
in this scenario we want uninterrupted saturation of the link to ensure the 
cluster state stays as up to date as possible. If these incidences are 
temporary hiccups, it would mean recovering more gracefully (just like 
memtables permit us to cope with temporary peaks above disk throughput, we 
should ideally be able to cope with temporary peaks above network throughput). 
If it's persistently above network capacity, I agree there's little we can do.

[~aweisberg] are you proposing to deliver gossip through SCTP along with the 
larger messages? Its mechanism for coping with HoL issues is basically the same 
as we intend to deliver with streaming network operations: chunked delivery of 
any single operation, so that each operation gets a fair amount of bandwidth. 
Since we have many operations it's not clear this would actually help, since in 
our theoretical under-provisioned link we could have thousands (or hundreds of 
thousands) of operations running concurrently, which could still HoL block 
something like gossip. Of course we could treat gossip as its own stream, and 
everything else as one stream, but then I'm not sure what we're getting above a 
regular TCP connection dedicated for gossip.

> Gossip messages subject to head of line blocking by other intra-cluster 
> traffic
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9237
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9237
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>
> Reported as an issue over less than perfect networks like VPNs between data 
> centers.
> Gossip goes over the small message socket where small is 64k which isn't 
> particularly small. This is done for performance to keep most traffic on one 
> hot socket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to