Like I mentioned, the possibility of easily introducing divergent views of the ring between live nodes is pretty dangerous, e.g. starting a new node with the same id as an existing live node will cause a collision. The existing node will not add the new node to the ring (although it will remain in gossip). Other nodes will remove the existing node from token metadata, but won't mark it down. There's no requirement for the new node to have the same tokens as the existing one either, so the topology has just completely changed without any constraints or movement of existing data. Subsequent reads and writes will be directed to different replica sets, depending on which coordinator they land on. The ownership of the host id as well as the status of nodes in the token metadata of peers will continue to flap if those nodes go down and come back up as the resolution of who rightfully owns the host id is decided on startup time.
As for things further down the line, it would be pretty untenable to base any new/improved cluster membership or data placement implementations on host id if the system isn't in control of assigning those. So even if only a handful of power users might actually make use of the feature, its very existence would constrain what we can assume/assert about host ids going forward. Given that drawback, the fact that this is a very niche feature makes it even less compelling. > On 27 Apr 2022, at 18:20, Paulo Motta <pauloricard...@gmail.com> wrote: > > Fully agree we should add a collision check but I don't understand why this > optional feature is bad/dangerous after we add this ability? Can you provide > an example of a potential issue? > > I don't expect this property to be used by most users, except power users > which normally know what they're doing. We have tons of potentially dangerous > knobs and I don't get why this particular one is any different. > > Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe <s...@beobal.com > <mailto:s...@beobal.com>> escreveu: > CASSANDRA-14582 added support for users to supply an arbitrary value for > HOST_ID when booting a new node. IMO it's a pretty bad and potentially > dangerous idea for the unique identifier to be settable in this way. Hint > delivery is already routed by host id and there have been several JIRAs which > have called for more fundamental reworking of cluster metadata using > permanent opaque identifiers rather than IPs to address members > (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like that > in future would be made much more difficult with this capability. > > Aside from the longer term implications, it seems that the feature as > currently implemented has some issues. There doesn't appear to be any > validation that a supplied host id isn't already in use by a live node, so > it's trivial to trigger a collision which can lead to divergent ring views > between nodes and ultimately in data loss. > > Although this landed in trunk almost 11 months ago it hasn't been included in > a release yet, so I propose we revert it before cutting 4.1 (although, as the > revert isn't a feature, I guess technically we could do that during the > freeze). I'm not completely convinced about encoding metadata into host ids, > but even if that is something we want to do, I don't think it's wise to > completely remove control over the identifiers from Cassandra itself. > > Thanks, > Sam > >> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova <e.dimitr...@gmail.com >> <mailto:e.dimitr...@gmail.com>> wrote: >> >> Hi everyone, >> >> Kind reminder that 1st May is around the corner. What does this mean? Our >> code freeze starts on 1st May and my understanding is that only bug fixing >> can go into the 4.1 branch. >> If anyone has anything to raise, now is a good time. On my end I saw a few >> things for this week that we should probably put to completion: >> - CASSANDRA-17571 <https://issues.apache.org/jira/browse/CASSANDRA-17571> - >> I have to close this one, it is in progress; new types in Config is good to >> be in before the freeze I guess, even if It is not yaml change >> - CASSANDRA-17557 <https://issues.apache.org/jira/browse/CASSANDRA-17557> - >> we need to take care of the parameters so we don't have to deprecate and >> support anything not actually needed; I think it is probably more or less >> done >> - CASSANDRA-17379 <https://issues.apache.org/jira/browse/CASSANDRA-17379> - >> adds a new flag around config; I think it is more or less done, depends on >> final CI and second reviewer maybe needed? >> - JMX intercept Cassandra exceptions, I think David mentioned a rebase was >> needed >> - CASSANDRA-17212 - The config property minimum_keyspace_rf and their >> nodetool getter and setter commands are new to 4.1. They are suitable to be >> ported to guardrails, and if we do this port in 4.1 we won't need to >> deprecate that property and nodetool commands in the next release, just one >> release after their introduction. >> >> I guess the failing tests we see could be fixed after the freeze but no API >> changes. >> >> Thanks everyone for all the hard work. Please don’t hesitate to raise the >> flag with questions, concerns or any help needed. >> >> Best regards, >> Ekaterina >