Yeah, what we have with inet is much like if we had a type like "numeric" that allowed you to write both ints and doubles. If we had actual "inet4" and "inet6" types, SAI would have been able to index them as fixed length values without doing the 4 -> 16 byte conversion. Given SAI could easily change this to go one way or another at post-filtering time, perhaps there's another option:
4.) Have an option on the column index that allows the user to specify whether ipv4 and ipv6 addresses are comparable. If they are, nothing changes. If they aren't, we can just take the matches from the index and filter "strictly". I'm not sure what's best here, because what it seems to hinge on is what users actually want to do when they throw both v4 and v6 addresses into a single column. Without any real loss in storage efficiency, you could index them in two separate columns on the same table, and none of this matters. If they are mixed, it feels like we should at least have the option to make them comparable, kind of like we have the option to make text case-insensitive or unicode normalized right now. On Wed, Mar 6, 2024 at 4:35 PM Bowen Song via dev <dev@cassandra.apache.org> wrote: > Technically, 127.0.0.1 (IPv4) is not 0:0:0:0:0:ffff:7f00:0001 (IPv6), > but their values are equal. Just like 1.0 (double) is not 1 (int), but > their values are equal. So, what is the meaning of "=" in CQL? > > On 06/03/2024 21:36, David Capwell wrote: > > So, was reviewing SAI and found we convert ipv4 to ipv6 (which is valid > for the type) and made me wonder what the behavior would be if client mixed > ipv4 with ipv4 encoded as ipv6… this caused me to find a different behavior > in SAI to the rest of C*… where I feel C* is doing the wrong thing… > > > > Lets walk over a simple example > > > > ipv4: 127.0.0.1 > > ipv6: 0:0:0:0:0:ffff:7f00:0001 > > > > Both of these address are equal according to networking and java… but > for C* they are different! These are 2 different values as ipv4 is 4 bytes > and ipv6 is 16 bytes, so 4 != 16! > > > > With SAI we convert all ipv4 to ipv6 so that the search logic is > correct… this causes SAI to return partitions that ALLOW FILTERING and > other indexes wouldn’t… > > > > This gets to the question in the subject… what SHOULD we do for this > type? > > > > I see 3 options: > > > > 1) SAI use the custom C* semantics where 4 != 16… this keeps us > consistent… > > 2) ALLOW FILTERING and other indexes are “fixed” so that we actually > match correctly… we are not really able to fix if the type is in a > partition or clustering column though… > > 3) deprecate inet in favor of a inet_better type… where inet semantics > is the custom C* semantics and inet_better handles this case > > > > Thoughts? >