[ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802209#comment-16802209
 ] 

Jon Meredith commented on CASSANDRA-15064:
------------------------------------------

I can confirm the behavior you're seeing, however I don't think it's something 
that should be changed.

Cassandra uses the ClockSeq in the manner described in section 4.1.5 as a 
mechanism to avoid generating duplicate UUIDs if the server is restarted witht 
the clock in the past, or comes up with a different node id [1]. As an aside, 
if the time reported by the OS via System.currentTimeMillis goes backwards 
while the server is running, the *timestamp* (not clock seq) will be 
constrained to be 100ns later than the previous timestamp returned until time 
catches up (or Cassandra is restarted).

The node id and clock sequence are initialized on server startup and should not 
change during execution, so I was wondering if you generated them 
synthetically? Here are the UUIDs you supplied broken into components by 
java.util.UUID

{noformat}
1 UUID 84e2c963-4ef9-11e9-b57e-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x0000357e]
2 UUID 84e2c963-4ef9-11e9-b57f-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x0000357f]
3 UUID 84e2c963-4ef9-11e9-b580-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x00003580]
4 UUID 84e2c963-4ef9-11e9-b581-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x00003581]
5 UUID 84e2c963-4ef9-11e9-b582-f0def1d0755e = variant[0b10] version[0b1] 
timestamp[0x01e94ef984e2c963] node[0xf0def1d0755e] clockseq[0x00003582]
{noformat}

The reason the UUID [3] and TimeUUID [4] types compare differently for the 
clock sequence (and node) is that they have slightly different comparison 
functions. The TimeUUID treats clk_seq_hi as a signed byte and then 
clk_seq_low/node as a series of unsigned bytes (not a single integer), however 
the UUIDType comparison just treats it as a single signed long. It looks like 
this behavior was preserved during the performance refactor in CASSANDRA-8730.

As for the correct ordering, although the spec does say to treat the clock 
sequence as an unsigned integer, it doesn't really convey any meaning as it is 
assigned from a random number generator.  (FWIW, The RFC also recommends 
lexical ordering by time_low, time_mid, then time_hi which seemed odd to me but 
is confirmed by reading the uuid_compare function in Appendix A).

For both TimeUUID / UUID the primary ordering is by 100-nsec resolution 
timestamp first which is what we want when dealing with timestamps, then 
deterministically on the node/clockseq (although unfortunately in a different 
order depending on TimeUUID/UUID).

Changing the comparison function affects the order data is stored on disk and 
would require users to perform some kind of migration (automated or manual), 
and although it would be nice to make them both the same, and perhaps change 
the clockseq to be treated as an unsigned int to reduce surprise I don't think 
the value is there to make any change.

Just in case I've missed the point, would you mind answering a couple of 
questions

# How did you notice this issue, was it with synthetic tests or through 
production usage?
# Why does it matter to you?  Are you trying to integrate with an external 
system that expects the TimeUUIDs to be ordered in a specific way?


[1] node id for Cassandra is generated from a hash of local IP addresses as the 
comments claim the MAC address was not easily accessible from Java. I'm not 
sure if the order of the hashed addresses is stable so it's possible the same 
node with same networking could come up with a different node.  
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/utils/UUIDGen.java#L349
  The method has been changed for trunk, but assuming you're not running it.

[2] Initializing clock seq 
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/utils/UUIDGen.java#L295

[3] UUID comparison 
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/marshal/UUIDType.java#L90

[4] TimeUUID comparison 
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/marshal/TimeUUIDType.java#L44


> Wrong ordering for timeuuid fields
> ----------------------------------
>
>                 Key: CASSANDRA-15064
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Andreas Andersen
>            Assignee: Jon Meredith
>            Priority: Normal
>         Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of timeuuid fields. They 
> seem to be sorted in the wrong order when the clock_seq_low field in a 
> timeuuid goes from 7f to 80. Consider the following example:
> {noformat}
> cqlsh:test> show version; 
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
> cqlsh:test> CREATE TABLE t ( 
>         ...     partition   int, 
>         ...     t           timeuuid, 
>         ...     i           int, 
>         ...  
>         ...     PRIMARY KEY(partition, t) 
>         ... ) 
>         ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t                                    | i 
> -----------+--------------------------------------+--- 
>          1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>          1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>          1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>          1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>          1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  
> (5 rows) 
> cqlsh:test>
> {noformat}
> The expected behavior is that the rows are returned in the same order as they 
> were inserted (we inserted them with their clustering key in an ascending 
> order). Instead, the order "wraps" in the middle.
> This issue only arises when the 9th octet (clock_seq_low) in the uuid goes 
> from 7f to 80. A guess would be that the comparison is implemented as a 
> signed integer instead of an unsigned integer, as 0x7f = 127 and 0x80 = -128. 
> According to the RFC, the field should be treated as an unsigned integer: 
> [https://tools.ietf.org/html/rfc4122#section-4.1.2]
> Changing the field from a timeuuid to a uuid gives the expected correct 
> behavior:
> {noformat}
> cqlsh:test> CREATE TABLE t ( 
>         ...     partition   int, 
>         ...     t           uuid, 
>         ...     i           int, 
>         ...  
>         ...     PRIMARY KEY(partition, t) 
>         ... ) 
>         ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t                                    | i 
> -----------+--------------------------------------+--- 
>          1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>          1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>          1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>          1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>          1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  
> (5 rows) 
> cqlsh:test>{noformat}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to