[jira] [Comment Edited] (CASSANDRA-14192) netstats information mismatch between senders and receivers

Vincent White (JIRA) Sun, 02 Dec 2018 20:59:12 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344515#comment-16344515
 ]


Vincent White edited comment on CASSANDRA-14192 at 12/3/18 4:58 AM:
--------------------------------------------------------------------

This is because we now use RangeAwareSSTableWriter to write out the incoming 
streams to disk. Its getFilename method returns just the keyspace/table rather 
than a complete filename (since it can write out more than one file during it's 
existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo 
which is keyed on the output filename. 

I have been planning an update to netstats to correctly output this information 
again. I'll update this ticket when I have something useful.


was (Author: vincentwhite):
This is because we now use RangeAwareSSTableWriter to write out the incoming 
streams to disk. Its getFilename method returns just the keyspace/table rather 
than a complete filename (since it can write out more than one file during it's 
existence). This confuses the map of receivingFiles/sendingFiles in SessionInfo 
which is keyed on the output filename. 

I have been planning an update to netstats to correctly output this information 
again. I'll update this ticket when I have someone useful.

> netstats information mismatch between senders and receivers
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-14192
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14192
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability
>            Reporter: Jonathan Ballet
>            Assignee: Vincent White
>            Priority: Minor
>
> When adding a new node to an existing cluster, the {{netstats}} command 
> called while the node is joining show different statistic values between the 
> node receiving the data and the nodes sending the data.
> Receiving node:
> {code}
> Mode: JOINING
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
>     /172.20.13.184
>     /172.20.30.7
>         Receiving 433 files, 36.64 GiB total. Already received 88 files, 4.6 
> GiB total
>             [...]
>     /172.20.40.128
>     /172.20.16.45
>         Receiving 405 files, 38.3 GiB total. Already received 86 files, 6.02 
> GiB total
>             [...]
>     /172.20.9.63
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         0              0         0
> Small messages                  n/a         0          11121         0
> Gossip messages                 n/a         0          32690         0
> {code}
> Sending node 1:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
>     /172.20.21.19
>         Sending 433 files, 36.64 GiB total. Already sent 433 files, 36.64 GiB 
> total
>             [...]
> Read Repair Statistics:
> Attempted: 680832
> Mismatch (Blocking): 716
> Mismatch (Background): 279
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         2         123307         4
> Small messages                  n/a         2      637010302       509
> Gossip messages                 n/a        23         798851     11535
> {code}
> Sending node 2:
> {code}
> Mode: NORMAL
> Bootstrap 0a599bf0-01c5-11e8-a256-8d847377f816
>     /172.20.21.19
>         Sending 405 files, 38.3 GiB total. Already sent 405 files, 38.3 GiB 
> total
>             [...]
> Read Repair Statistics:
> Attempted: 84967
> Mismatch (Blocking): 17568
> Mismatch (Background): 3078
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         2          17818         2
> Small messages                  n/a         2      126082304       507
> Gossip messages                 n/a        34         202810     11725
> {code}
> In this case, the join process is running since a while and the sending nodes 
> seem to say they sent everything already. This output stays the same for a 
> while though (maybe ~15% of the total joining time).
> However, the receiving node values stay like this once the sending nodes have 
> sent everything, until it goes from this state to the {{NORMAL}} state (so 
> there's visually no catching up from ~86 files to ~405 files for example, it 
> goes directly from the state showed above to {{NORMAL}})
> This makes tracking the progress of the join process a bit more difficult 
> than needed, because we need to compare and deduce the actual state from both 
> the receiving node values and the sending nodes values, which are both "not 
> correct" (sending nodes say everything has been sent but stays in this state 
> for a long time, receiving node says it still needs to download lot of 
> files/data before finishing.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-14192) netstats information mismatch between senders and receivers

Reply via email to