Sylvain Lebresne updated CASSANDRA-12791:
    Reviewer: Stefania
      Status: Patch Available  (was: Open)

I'm attaching a patch with the changes described above:
| [12791-trunk|https://github.com/pcmanus/cassandra/commits/12791-trunk] | 
[utests|http://cassci.datastax.com/job/pcmanus-12791-trunk-testall] | 
[dtests|http://cassci.datastax.com/job/pcmanus-12791-trunk-dtest] |

The patch actually does 2 (less important) things on top of the points above:
# it uses {{ApproximateTime.currentTimeMillis()}} for anything related to the 
message "construction time" timestamp. This was currently a bit inconsistent in 
that the timestamp was set using {{System.currentTimeMillis()}} but later 
compared to {{ApproximateTime.currentTimeMillis()}}. This is not really a 
problem in practice but I'm a stickler for consistency so that was bugging me.
# the patch gets rid of the {{ConstructionTime}} class. This is arguably a bit 
of a personal opinion here since the behavior of that class is still there, 
it's just somewhat "inlined" in {{MessageIn}}/{{Monitorable}}. The reason is 
that after this patch, grouping the 2 infos of that class (timestamp and 
isCrossNode boolean) together doesn't make sense to me: the {{isCrossNode}} 
property is one of message/operation, not of the timestamp.

Also, the patch is currently against trunk. It's probably worth at least 
putting it 3.X though, so unless object to that I'll rebase to that branch once 
I have test results for trunk. I'm not sure it's worth bothering with 3.0 
though: as said in the description, I consider it kind of a bug in that not 
having "crossNode" metrics be incremented by default will almost surely 
surprise users, but it's also fairly minor and could be called a "weird 

> MessageIn logic to determine if the message is cross-node is wrong
> ------------------------------------------------------------------
>                 Key: CASSANDRA-12791
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12791
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
> {{MessageIn}} has the following code to read the 'creation time' of the 
> message on the receiving side:
> {noformat}
> public static ConstructionTime readTimestamp(InetAddress from, DataInputPlus 
> input, long timestamp) throws IOException
> {
>     // make sure to readInt, even if cross_node_to is not enabled
>     int partial = input.readInt();
>     long crossNodeTimestamp = (timestamp & 0xFFFFFFFF00000000L) | (((partial 
> & 0xFFFFFFFFL) << 2) >> 2);
>     if (timestamp > crossNodeTimestamp)
>     {
>         MessagingService.instance().metrics.addTimeTaken(from, timestamp - 
> crossNodeTimestamp);
>     }
>     if(DatabaseDescriptor.hasCrossNodeTimeout())
>     {
>         return new ConstructionTime(crossNodeTimestamp, timestamp != 
> crossNodeTimestamp);
>     }
>     else
>     {
>         return new ConstructionTime();
>     }
> }
> {noformat}
> where {{timestamp}} is really the local time on the receiving node when 
> calling that method.
> The incorrect part, I believe, is the {{timestamp != crossNodeTimestamp}} 
> used to set the {{isCrossNode}} field of {{ConstructionTime}}. A first 
> problem is that this will basically always be {{true}}: for it to be 
> {{false}}, we'd need the low-bytes of the timestamp taken on the sending node 
> to coincide exactly with the ones taken on the receiving side, which is 
> _very_ unlikely. It is also a relatively meaningless test: having that test 
> be {{false}} basically means the lack of clock sync between the 2 nodes is 
> exactly the time the 2 calls to {{System.currentTimeMillis()}} (on sender and 
> receiver), which is definitively not what we care about.
> What the result of this test is used for is to determine if the message was 
> crossNode or local. It's used to increment different metrics (we separate 
> metric local versus crossNode dropped messages) in {{MessagingService}} for 
> instance. And that's where this is kind of a bug: not only the {{timestamp != 
> crossNodeTimestamp}}, but if {{DatabaseDescriptor.hasCrossNodeTimeout()}}, we 
> *always* have this {{isCrossNode}} false, which means we'll never increment 
> the "cross-node dropped messages" metric, which is imo unexpected.
> That is, it is true that if {{DatabaseDescriptor.hasCrossNodeTimeout() == 
> false}}, then we end using the receiver side timestamp to timeout messages, 
> and so you end up only dropping messages that timeout locally. And _in that 
> sense_, always incrementing the "locally" dropped messages metric is not 
> completely illogical. But I doubt most users are aware of those pretty 
> specific nuance when looking at the related metrics, and I'm relatively sure 
> users expect a metrics named {{droppedCrossNodeTimeout}} to actually count 
> cross-node messages by default (keep in mind that 
> {{DatabaseDescriptor.hasCrossNodeTimeout()}} is actually false by default).
> Anyway, to sum it up I suggest that the following change should be done:
> # the {{timestamp != crossNodeTimestamp}} test is definitively not what we 
> want. We should at a minimum just replace it to {{true}} as that's basically 
> what it ends up being except for very rare and arguably random cases.
> # given how the {{ConstructionTime.isCrossNode}} is used, I suggest that we 
> really want it to mean if the message has shipped cross-node, not just be a 
> synonymous of {{DatabaseDescriptor.hasCrossNodeTimeout()}}. It should be 
> whether the message shipped cross-node, i.e. whether {{from == 
> BroadcastAdress()}} or not.

This message was sent by Atlassian JIRA

Reply via email to