Like I mentioned, the possibility of easily introducing divergent views of the 
ring between live nodes is pretty dangerous, e.g. starting a new node with the 
same id as an existing live node will cause a collision. The existing node will 
not add the new node to the ring (although it will remain in gossip). Other 
nodes will remove the existing node from token metadata, but won't mark it 
down. There's no requirement for the new node to have the same tokens as the 
existing one either, so the topology has just completely changed without any 
constraints or movement of existing data. Subsequent reads and writes will be 
directed to different replica sets, depending on which coordinator they land 
on. The ownership of the host id as well as the status of nodes in the token 
metadata of peers will continue to flap if those nodes go down and come back up 
as the resolution of who rightfully owns the host id is decided on startup 
time.   

As for things further down the line, it would be pretty untenable to base any 
new/improved cluster membership or data placement implementations on host id if 
the system isn't in control of assigning those. So even if only a handful of 
power users might actually make use of the feature, its very existence would 
constrain what we can assume/assert about host ids going forward. Given that 
drawback, the fact that this is a very niche feature makes it even less 
compelling.


> On 27 Apr 2022, at 18:20, Paulo Motta <pauloricard...@gmail.com> wrote:
> 
> Fully agree we should add a collision check but I don't understand why this 
> optional feature is bad/dangerous after we add this ability? Can you provide 
> an example of a potential issue?
> 
> I don't expect this property to be used by most users, except power users 
> which normally know what they're doing. We have tons of potentially dangerous 
> knobs and I don't get why this particular one is any different.
> 
> Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe <s...@beobal.com 
> <mailto:s...@beobal.com>> escreveu:
> CASSANDRA-14582 added support for users to supply an arbitrary value for 
> HOST_ID when booting a new node. IMO it's a pretty bad and potentially 
> dangerous idea for the unique identifier to be settable in this way. Hint 
> delivery is already routed by host id and there have been several JIRAs which 
> have called for more fundamental reworking of cluster metadata using 
> permanent opaque identifiers rather than IPs to address members 
> (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like that 
> in future would be made much more difficult with this capability. 
> 
> Aside from the longer term implications, it seems that the feature as 
> currently implemented has some issues. There doesn't appear to be any 
> validation that a supplied host id isn't already in use by a live node, so 
> it's trivial to trigger a collision which can lead to divergent ring views 
> between nodes and ultimately in data loss.
> 
> Although this landed in trunk almost 11 months ago it hasn't been included in 
> a release yet, so I propose we revert it before cutting 4.1 (although, as the 
> revert isn't a feature, I guess technically we could do that during the 
> freeze). I'm not completely convinced about encoding metadata into host ids, 
> but even if that is something we want to do, I don't think it's wise to 
> completely remove control over the identifiers from Cassandra itself.  
> 
> Thanks, 
> Sam
> 
>> On 25 Apr 2022, at 16:17, Ekaterina Dimitrova <e.dimitr...@gmail.com 
>> <mailto:e.dimitr...@gmail.com>> wrote:
>> 
>> Hi everyone,
>> 
>> Kind reminder that 1st May is around the corner. What does this mean? Our 
>> code freeze starts on 1st May and my understanding is that only bug fixing 
>> can go into the 4.1 branch. 
>> If anyone has anything to raise, now is a good time. On my end I saw a few 
>> things for this week that we should probably put to completion:
>> - CASSANDRA-17571 <https://issues.apache.org/jira/browse/CASSANDRA-17571> - 
>> I have to close this one, it is in progress; new types in Config is good to 
>> be in before the freeze I guess, even if It is not yaml change
>> - CASSANDRA-17557 <https://issues.apache.org/jira/browse/CASSANDRA-17557> - 
>> we need to take care of the parameters so we don't have to deprecate and  
>> support anything not actually needed; I think it is probably more or less 
>> done
>> - CASSANDRA-17379 <https://issues.apache.org/jira/browse/CASSANDRA-17379> - 
>> adds a new flag around config; I think it is more or less done, depends on 
>> final CI and second reviewer maybe needed? 
>> - JMX intercept Cassandra exceptions, I think David mentioned a rebase was 
>> needed
>> - CASSANDRA-17212 - The config property minimum_keyspace_rf and their 
>> nodetool getter and setter commands are new to 4.1. They are suitable to be 
>> ported to guardrails, and if we do this port in 4.1 we won't need to 
>> deprecate that property and nodetool commands in the next release, just one 
>> release after their introduction.
>> 
>> I guess the failing tests we see could be fixed after the freeze but no API 
>> changes.
>> 
>> Thanks everyone for all the hard work. Please don’t hesitate to raise the 
>> flag with questions, concerns or any help needed. 
>> 
>> Best regards,
>> Ekaterina
> 

Reply via email to