[ 
https://issues.apache.org/jira/browse/CASSANDRA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320700#comment-14320700
 ] 

Benedict commented on CASSANDRA-8789:
-------------------------------------

bq. is 64k a useful threshold?

It's a little arbitrary, but seems a reasonable cutoff

bq. Will systems regularly hit that threshold as part of operations like 
repair, streaming

It looks possible that merkle trees might be significantly larger than this, 
but I'm not certain. In this case the extra connection would simply be mostly 
idle, which isn't really a problem.

bq. One finicky thing is that we don't actually know the serialized size of 
messages

Well, we have a few options here:

# We already calculate this during serialization, so we could extract the work 
and use it twice
# Whilst building our message we could accumulate its size, since it would be 
useful elsewhere as well (such as enforcing memory utilisation constraints), 
and might reduce the cost of serialization
# We could serialize our messages into a ByteBuffer, that we copy wholesale 
into the second connection's buffer if we overflow, resetting the small message 
connection's buffer (this might also obviate the need to calculate during 
serialization, as we could leave room prefixing the message serialization 
location to annotate its size once serialization completes)
# We could bound the number of elements we have to visit by tracking the 
average size of each element and treating the whole thing as oversized if 
assuming the average applies for the remainder of the message we would exceed 
our limit. This would let us abort potentially very early for large messages, 
but not small ones.


> Revisit how OutboundTcpConnection pools two connections for different message 
> types
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8789
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 3.0
>
>
> I was looking at this trying to understand what messages flow over which 
> connection.
> For reads the request goes out over the command connection and the response 
> comes back over the ack connection.
> For writes the request goes out over the command connection and the response 
> comes back over the command connection.
> Reads get a dedicated socket for responses. Mutation commands and responses 
> both travel over the same socket along with read requests.
> Sockets are used uni-directional so there are actually four sockets in play 
> and four threads at each node (2 inbounded, 2 outbound).
> CASSANDRA-488 doesn't leave a record of what the impact of this change was. 
> If someone remembers what situations were made better it would be good to 
> know.
> I am not clear on when/how this is helpful. The consumer side shouldn't be 
> blocking so the only head of line blocking issue is the time it takes to 
> transfer data over the wire.
> If message size is the cause of blocking issues then the current design mixes 
> small messages and large messages on the same connection retaining the head 
> of line blocking.
> Read requests share the same connection as write requests (which are large), 
> and write acknowledgments (which are small) share the same connections as 
> write requests. The only winner is read acknowledgements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to