Sylvain Lebresne created CASSANDRA-12791:

             Summary: MessageIn logic to determine if the message is cross-node 
is wrong
                 Key: CASSANDRA-12791
             Project: Cassandra
          Issue Type: Bug
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Minor

{{MessageIn}} has the following code to read the 'creation time' of the message 
on the receiving side:
public static ConstructionTime readTimestamp(InetAddress from, DataInputPlus 
input, long timestamp) throws IOException
    // make sure to readInt, even if cross_node_to is not enabled
    int partial = input.readInt();
    long crossNodeTimestamp = (timestamp & 0xFFFFFFFF00000000L) | (((partial & 
0xFFFFFFFFL) << 2) >> 2);
    if (timestamp > crossNodeTimestamp)
        MessagingService.instance().metrics.addTimeTaken(from, timestamp - 
        return new ConstructionTime(crossNodeTimestamp, timestamp != 
        return new ConstructionTime();
where {{timestamp}} is really the local time on the receiving node when calling 
that method.

The incorrect part, I believe, is the {{timestamp != crossNodeTimestamp}} used 
to set the {{isCrossNode}} field of {{ConstructionTime}}. A first problem is 
that this will basically always be {{true}}: for it to be {{false}}, we'd need 
the low-bytes of the timestamp taken on the sending node to coincide exactly 
with the ones taken on the receiving side, which is _very_ unlikely. It is also 
a relatively meaningless test: having that test be {{false}} basically means 
the lack of clock sync between the 2 nodes is exactly the time the 2 calls to 
{{System.currentTimeMillis()}} (on sender and receiver), which is definitively 
not what we care about.

What the result of this test is used for is to determine if the message was 
crossNode or local. It's used to increment different metrics (we separate 
metric local versus crossNode dropped messages) in {{MessagingService}} for 
instance. And that's where this is kind of a bug: not only the {{timestamp != 
crossNodeTimestamp}}, but if {{DatabaseDescriptor.hasCrossNodeTimeout()}}, we 
*always* have this {{isCrossNode}} false, which means we'll never increment the 
"cross-node dropped messages" metric, which is imo unexpected.

That is, it is true that if {{DatabaseDescriptor.hasCrossNodeTimeout() == 
false}}, then we end using the receiver side timestamp to timeout messages, and 
so you end up only dropping messages that timeout locally. And _in that sense_, 
always incrementing the "locally" dropped messages metric is not completely 
illogical. But I doubt most users are aware of those pretty specific nuance 
when looking at the related metrics, and I'm relatively sure users expect a 
metrics named {{droppedCrossNodeTimeout}} to actually count cross-node messages 
by default (keep in mind that {{DatabaseDescriptor.hasCrossNodeTimeout()}} is 
actually false by default).

Anyway, to sum it up I suggest that the following change should be done:
# the {{timestamp != crossNodeTimestamp}} test is definitively not what we 
want. We should at a minimum just replace it to {{true}} as that's basically 
what it ends up being except for very rare and arguably random cases.
# given how the {{ConstructionTime.isCrossNode}} is used, I suggest that we 
really want it to mean if the message has shipped cross-node, not just be a 
synonymous of {{DatabaseDescriptor.hasCrossNodeTimeout()}}. It should be 
whether the message shipped cross-node, i.e. whether {{from == 
BroadcastAdress()}} or not.

This message was sent by Atlassian JIRA

Reply via email to