Hi all, CASSANDRA-17575 has detected that token ranges in nodetool compact are interpreted as closed on both sides. For example, the command "nodetool compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of interpreting token ranges is unusual since token ranges are usually half-open, and I think that in the previous example one would expect that the compacted tokens would be in (10, 50]. That's for example the way nodetool repair works, and indeed the class org.apache.cassandra.dht.Range is always half-open.
It's worth mentioning that, differently from nodetool repair, the help and doc for nodetool compact doesn't specify whether the supplied start/end tokens are inclusive or exclusive. I think that ideally nodetool compact should interpret the provided token ranges as half-open, to be consistent with how token ranges are usually interpreted. However, this would change the way the tool has worked until now. This change might be problematic for existing users relying on the old behaviour. That would be especially severe for the case where the begin and end token are the same, because interpreting [x, x] we would compact a single token, whereas I think that interpreting (x, x] would compact all the tokens. As for compacting ranges including multiple tokens, I think the change wouldn't be so bad, since probably the supplied token ranges come from tools that are already presenting the ranges as half-open. Also, if we are splitting the full ring into smaller ranges, half-open intervals would still work and would save us some repetitions. So my question is: Should we change the behaviour of nodetool compact to interpret the token ranges as half-opened, aligning it with the usual interpretation of ranges? Or should we just document the current odd behaviour to prevent compatibility issues? A third option would be changing to half-opened ranges and also forbidding ranges where the begin and end token are the same, to prevent the accidental compaction of the entire ring. Note that nodetool repair also forbids this type of token ranges. What do you think?