[
https://issues.apache.org/jira/browse/CASSANDRA-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040168#comment-14040168
]
Benedict commented on CASSANDRA-6108:
-------------------------------------
Is this approach inherently incompatible with client-provided-timestamps? As
far as replacing timestamps are concerned, anyway; not necessarily as a
datatype.
I think solving this problem properly is going to be very challenging, but I'd
like to propose the following (rough sketch) of a solution. Note that this
doesn't solve timeid64, as much as mostly-unique cluster-wide timestamps in
64-bits or less that can be generated by the client:
# I propose each client auto-generates a 20-bit id on startup. We can try to
make this guaranteed unique, but I think a random number is probably
sufficient.
# We define rolling epochs, each ~6 days apart, which is ~half the addressable
ms interval in 32-bits, i.e. given any full ms time we split into its most
recent epoch plus its delta from that epoch.
# Each client then produces a timestamp that is 32-bits of current time (in
millis) since the most recent epoch, a local monotonically increasing 14-bit
value that is reset each ms, and their unique id
On the cluster we ensure memtables are flushed at least once per epoch, with
the epoch appearing in the metadata, and we consider a full timestamp to be a
composite of the timestamp stored combined with the epoch. Once the data is
fully repaired prior to an epoch we can optionally save 32-bits per cell by
stripping out the per-node and monotonically increasing timestamp values on
compaction. The added complexity, as far as I can tell, will be in repairs,
hints and compaction which need to ensure they compare a 96-bit timestamp
instead of a 64-bit one. But in compaction at least this might actually
simplify matters, as reconcile knows in advance which sstables it prefers data
from.
It's a pretty non-trivial change, and needs some further thought, but I think
only non-trivial solutions are probably going to work for this non-trivial
problem.
Some possible safety optimisations with this solution might include refusing
client timestamps that are not within some sensible skew from now, e.g. within
1 day, or 1 hour, giving a high degree of confidence the cluster is
sufficiently in sync, since old timestamps should only appear during client
retries, which should not be so badly delayed. We could also move to micros
time if some users require it with this solution (which no doubt some will),
with narrower epochs.
> Create timeid64 type
> --------------------
>
> Key: CASSANDRA-6108
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6108
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 2.1.1
>
>
> As discussed in CASSANDRA-6106, we could create a 64-bit type with 48 bits of
> timestamp and 16 bites of unique coordinator id. This would give us a
> unique-per-cluster value that could be used as a more compact replacement for
> many TimeUUID uses.
--
This message was sent by Atlassian JIRA
(v6.2#6252)