[
https://issues.apache.org/jira/browse/CASSANDRA-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344515#comment-16344515
]
Vincent White edited comment on CASSANDRA-14192 at 12/3/18 4:58 AM:
--------------------------------------------------------------------
This is because we now use RangeAwareSSTableWriter to write out the incoming
streams to disk. Its getFilename method returns just the keyspace/table rather
than a complete filename (since it can write out more than one file during it's
existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo
which is keyed on the output filename.
I have been planning an update to netstats to correctly output this information
again. I'll update this ticket when I have something useful.
was (Author: vincentwhite):
This is because we now use RangeAwareSSTableWriter to write out the incoming
streams to disk. Its getFilename method returns just the keyspace/table rather
than a complete filename (since it can write out more than one file during it's
existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo
which is keyed on the output filename.
I have been planning an update to netstats to correctly output this information
again. I'll update this ticket when I have someone useful.
> netstats information mismatch between senders and receivers
> -----------------------------------------------------------
>
> Key: CASSANDRA-14192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14192
> Project: Cassandra
> Issue Type: Bug
> Components: Observability
> Reporter: Jonathan Ballet
> Assignee: Vincent White
> Priority: Minor
>
> When adding a new node to an existing cluster, the {{netstats}} command
> called while the node is joining show different statistic values between the
> node receiving the data and the nodes sending the data.
> Receiving node:
> {code}
> Mode: JOINING
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.13.184
> /172.20.30.7
> Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6
> GiB total
> [...]
> /172.20.40.128
> /172.20.16.45
> Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02
> GiB total
> [...]
> /172.20.9.63
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name Active Pending Completed Dropped
> Large messages n/a 0 0 0
> Small messages n/a 0 11121 0
> Gossip messages n/a 0 32690 0
> {code}
> Sending node 1:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.21.19
> Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB
> total
> [...]
> Read Repair Statistics:
> Attempted: 680832
> Mismatch (Blocking): 716
> Mismatch (Background): 279
> Pool Name Active Pending Completed Dropped
> Large messages n/a 2 123307 4
> Small messages n/a 2 637010302 509
> Gossip messages n/a 23 798851 11535
> {code}
> Sending node 2:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
> /172.20.21.19
> Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB
> total
> [...]
> Read Repair Statistics:
> Attempted: 84967
> Mismatch (Blocking): 17568
> Mismatch (Background): 3078
> Pool Name Active Pending Completed Dropped
> Large messages n/a 2 17818 2
> Small messages n/a 2 126082304 507
> Gossip messages n/a 34 202810 11725
> {code}
> In this case, the join process is running since a while and the sending nodes
> seem to say they sent everything already. This output stays the same for a
> while though (maybe ~15% of the total joining time).
> However, the receiving node values stay like this once the sending nodes have
> sent everything, until it goes from this state to the {{NORMAL}} state (so
> there's visually no catching up from ~86 files to ~405 files for example, it
> goes directly from the state showed above to {{NORMAL}})
> This makes tracking the progress of the join process a bit more difficult
> than needed, because we need to compare and deduce the actual state from both
> the receiving node values and the sending nodes values, which are both "not
> correct" (sending nodes say everything has been sent but stays in this state
> for a long time, receiving node says it still needs to download lot of
> files/data before finishing.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]