[
https://issues.apache.org/jira/browse/CASSANDRA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Brown updated CASSANDRA-14174:
------------------------------------
Description:
I have personally tripped up on this function a couple of times over the years,
believing that it contributes to bugs in some way or another. While I have not
found that (necessarily!) to be the case, I feel this function is completely
useless in the grand scope of things.
Going back through the mists of time (that is, {{git log}}), it appears this
function was part of the original code drop from Facebook when they open
sourced cassandra. Looking at the {{#doSort()}} method, all it does is sort the
incoming list of {{GossipDigest}} s by the difference between the remote node's
maxValue for a given peer and the local nodes' maxValue.
The only universe where this is actually an optimization is if you go back and
read the [Scuttlebutt
paper|https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf] (upon which
cassandra's Gossip anti-entropy reconciliation is based). The end of section
3.2 describes ordering of the incoming digests such that, in the case where you
do not return all of the differences (because you are optimizing for the return
message size), you can gather the differences for the peers which are most of
out sync. The ordering implemented in cassandra is the second ordering
described in the paper, called "scuttle depth".
As we always send all differences between two nodes (message size be damned),
this optimization, borrowed from the paper, is largely irrelevant for
Cassandra's purposes.
Thus, I propose we remove this method for the following gains:
- less garbage created
- less CPU (sure, it's mostly trivial; see next point)
- less time spent on unnecessary functionality on the *single threaded* gossip
stage.
was:
I have personally tripped up on this function a couple of times over the years,
believing that it contributes to bugs in some way or another. While I have not
found that (necessarily!) to be the case, I feel this function is completely
useless in the grand scope of things.
Going back through the mists of time (that is, {{git log}}), it appears this
function was part of the original code drop from Facebook when they open
sourced cassandra. Looking at the {{#doSort()}} method, all it does is sort the
incoming list of \{{GossipDigest}} s by the difference between the remote
node's maxValue for a given peer and the local nodes' maxValue.
The only universe where is actually an optimization is if you go back and read
the [Scuttlebutt
paper|https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf] (upon which
cassandra's Gossip anti-reconcilliation is based). The end of section 3.2
describes ordering of the incoming digests such that, in the case where you do
not return all of the differences (because you are optimizing for the return
message size), you can gather the differences for the peers which are most of
out sync. The ordering implemented in cassandra is the second ordering
described in the paper, called "scuttle depth".
As we always send all differences between two nodes (message size be damned),
this optimization, borrowed from the paper, is largely irrelevant for
Cassandra's purposes.
Thus, I propose we remove this method for the following gains:
- less garbage created
- less CPU (sure, it's mostly trivial; see next point)
- less time spent on unnecessary functionality on the *single threaded* gossip
stage.
> Remove GossipDigestSynVerbHandler#doSort()
> ------------------------------------------
>
> Key: CASSANDRA-14174
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14174
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Fix For: 4.x
>
>
> I have personally tripped up on this function a couple of times over the
> years, believing that it contributes to bugs in some way or another. While I
> have not found that (necessarily!) to be the case, I feel this function is
> completely useless in the grand scope of things.
> Going back through the mists of time (that is, {{git log}}), it appears this
> function was part of the original code drop from Facebook when they open
> sourced cassandra. Looking at the {{#doSort()}} method, all it does is sort
> the incoming list of {{GossipDigest}} s by the difference between the remote
> node's maxValue for a given peer and the local nodes' maxValue.
> The only universe where this is actually an optimization is if you go back
> and read the [Scuttlebutt
> paper|https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf] (upon which
> cassandra's Gossip anti-entropy reconciliation is based). The end of section
> 3.2 describes ordering of the incoming digests such that, in the case where
> you do not return all of the differences (because you are optimizing for the
> return message size), you can gather the differences for the peers which are
> most of out sync. The ordering implemented in cassandra is the second
> ordering described in the paper, called "scuttle depth".
> As we always send all differences between two nodes (message size be damned),
> this optimization, borrowed from the paper, is largely irrelevant for
> Cassandra's purposes.
> Thus, I propose we remove this method for the following gains:
> - less garbage created
> - less CPU (sure, it's mostly trivial; see next point)
> - less time spent on unnecessary functionality on the *single threaded*
> gossip stage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]