[ 
https://issues.apache.org/jira/browse/CASSANDRA-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040168#comment-14040168
 ] 

Benedict commented on CASSANDRA-6108:
-------------------------------------

Is this approach inherently incompatible with client-provided-timestamps? As 
far as replacing timestamps are concerned, anyway; not necessarily as a 
datatype.

I think solving this problem properly is going to be very challenging, but I'd 
like to propose the following (rough sketch) of a solution. Note that this 
doesn't solve timeid64, as much as mostly-unique cluster-wide timestamps in 
64-bits or less that can be generated by the client: 

# I propose each client auto-generates a 20-bit id on startup. We can try to 
make this guaranteed unique, but I think a random number is probably 
sufficient. 
# We define rolling epochs, each ~6 days apart, which is ~half the addressable 
ms interval in 32-bits, i.e. given any full ms time we split into its most 
recent epoch plus its delta from that epoch.
# Each client then produces a timestamp that is 32-bits of current time (in 
millis) since the most recent epoch, a local monotonically increasing 14-bit 
value that is reset each ms, and their unique id

On the cluster we ensure memtables are flushed at least once per epoch, with 
the epoch appearing in the metadata, and we consider a full timestamp to be a 
composite of the timestamp stored combined with the epoch. Once the data is 
fully repaired prior to an epoch we can optionally save 32-bits per cell by 
stripping out the per-node and monotonically increasing timestamp values on 
compaction. The added complexity, as far as I can tell, will be in repairs, 
hints and compaction which need to ensure they compare a 96-bit timestamp 
instead of a 64-bit one. But in compaction at least this might actually 
simplify matters, as reconcile knows in advance which sstables it prefers data 
from.

It's a pretty non-trivial change, and needs some further thought, but I think 
only non-trivial solutions are probably going to work for this non-trivial 
problem.

Some possible safety optimisations with this solution might include refusing 
client timestamps that are not within some sensible skew from now, e.g. within 
1 day, or 1 hour, giving a high degree of confidence the cluster is 
sufficiently in sync, since old timestamps should only appear during client 
retries, which should not be so badly delayed. We could also move to micros 
time if some users require it with this solution (which no doubt some will), 
with narrower epochs.

> Create timeid64 type
> --------------------
>
>                 Key: CASSANDRA-6108
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6108
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 2.1.1
>
>
> As discussed in CASSANDRA-6106, we could create a 64-bit type with 48 bits of 
> timestamp and 16 bites of unique coordinator id.  This would give us a 
> unique-per-cluster value that could be used as a more compact replacement for 
> many TimeUUID uses.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to