Inclusive/exclusive endpoints when compacting token ranges

Andrés de la Peña Tue, 26 Jul 2022 04:49:52 -0700

Hi all,

CASSANDRA-17575 has detected that token ranges in nodetool compact are
interpreted as closed on both sides. For example, the command "nodetool
compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of
interpreting token ranges is unusual since token ranges are usually
half-open, and I think that in the previous example one would expect that
the compacted tokens would be in (10, 50]. That's for example the way
nodetool repair works, and indeed the class org.apache.cassandra.dht.Range
is always half-open.


It's worth mentioning that, differently from nodetool repair, the help and
doc for nodetool compact doesn't specify whether the supplied start/end
tokens are inclusive or exclusive.

I think that ideally nodetool compact should interpret the provided token
ranges as half-open, to be consistent with how token ranges are usually
interpreted. However, this would change the way the tool has worked until
now. This change might be problematic for existing users relying on the old
behaviour. That would be especially severe for the case where the begin and
end token are the same, because interpreting [x, x] we would compact a
single token, whereas I think that interpreting (x, x] would compact all
the tokens. As for compacting ranges including multiple tokens, I think the
change wouldn't be so bad, since probably the supplied token ranges come
from tools that are already presenting the ranges as half-open. Also, if we
are splitting the full ring into smaller ranges, half-open intervals would
still work and would save us some repetitions.

So my question is: Should we change the behaviour of nodetool compact to
interpret the token ranges as half-opened, aligning it with the usual
interpretation of ranges? Or should we just document the current odd
behaviour to prevent compatibility issues?

A third option would be changing to half-opened ranges and also forbidding
ranges where the begin and end token are the same, to prevent the
accidental compaction of the entire ring. Note that nodetool repair also
forbids this type of token ranges.

What do you think?

Inclusive/exclusive endpoints when compacting token ranges

Reply via email to