> I don’t think we can readily migrate old types away from this however, > without breaking backwards compatibility.
Given that java driver has a different behavior from server, I wouldn’t be shocked to see that other drivers also have their own custom behaviors… so not clear how to migrate unless we actually hand a user facing standard per type… if all drivers use a “default value” and is consistent, I do think we could migrate, but would need to live with this till at least 6.0+ > We can only prevent its use in the CQL layer where support isn’t required. +1 > On Sep 20, 2023, at 7:38 AM, Benedict <bened...@apache.org> wrote: > > Yes, if this is what was meant by empty I agree. It’s nonsensical for most > types. Apologies for any confusion. > > I don’t think we can readily migrate old types away from this however, > without breaking backwards compatibility. We can only prevent its use in the > CQL layer where support isn’t required. My understanding was that we had at > least tried to do this for all non-thrift schemas, but perhaps we did not do > so thoroughly and now may have some CQL legacy support requirements as well. > >> On 20 Sep 2023, at 15:30, Aleksey Yeshchenko <alek...@apple.com> wrote: >> >> Allowing zero-length byte arrays for most old types is just a legacy from >> Darker Days. It’s a distinct concern from columns being nullable or not. >> >> There are a couple types where this makes sense: strings and blobs. All else >> should not allow this except for backward compatibility reasons. So, not for >> new types. >> >>>> On 20 Sep 2023, at 00:08, David Capwell <dcapw...@apple.com> wrote: >>>> >>>> When does empty mean null? >>> >>> >>> Most types are this way >>> >>> @Test >>> public void nullExample() >>> { >>> createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); >>> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new >>> byte[0])); >>> Row result = execute("SELECT * FROM %s WHERE pk=0").one(); >>> if (result.has("cuteness")) System.out.println("Cuteness score: " + >>> result.getInt("cuteness")); >>> else System.out.println("Cuteness score is undefined"); >>> } >>> >>> >>> This test will NPE in getInt as the returned BB is seen as “null” for int32 >>> type, you can make it “safer” by changing to the following >>> >>> if (result.has("cuteness")) System.out.println("Cuteness score: " + >>> Int32Type.instance.compose(result.getBlob("cuteness"))); >>> >>> Now we get the log "Cuteness score: null” >>> >>> What’s even better (just found this out) is that client isn’t consistent or >>> correct in these cases! >>> >>> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, >>> "SELECT * FROM %s WHERE pk=0").one(); >>> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness >>> score: " + result.getInt("cuteness")); >>> else System.out.println("Cuteness score is undefined”); >>> >>> This prints "Cuteness score: 0” >>> >>> So for Cassandra we think the value is “null” but java driver thinks it’s 0? >>> >>>> Do we have types where writing an empty value creates a tombstone? >>> >>> Empty does not generate a tombstone for any type, but empty has a similar >>> user experience as we return null in both cases (but just found out that >>> the drivers may not be consistent with this…) >>> >>>>> On Sep 19, 2023, at 3:33 PM, J. D. Jordan <jeremiah.jor...@gmail.com> >>>>> wrote: >>>> >>>> When does empty mean null? My understanding was that empty is a valid >>>> value for the types that support it, separate from null (aka a tombstone). >>>> Do we have types where writing an empty value creates a tombstone? >>>> >>>> I agree with David that my preference would be for only blob and string >>>> like types to support empty. It’s too late for the existing types, but we >>>> should hold to this going forward. Which is what I think the idea was in >>>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was >>>> sad the existing numerics were emptiable, but too late to change, and we >>>> could correct it for newer types. >>>> >>>>> On Sep 19, 2023, at 12:12 PM, David Capwell <dcapw...@apple.com> wrote: >>>>> >>>>> >>>>>> >>>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started >>>>>> making types non -emptiable. This approach makes more sense to me as >>>>>> having to deal with empty value is error prone in my opinion. >>>>> >>>>> I agree it’s confusing, and in the patch I found that different code >>>>> paths didn’t handle things correctly as we have some times (most) that >>>>> support empty bytes, and some that do not…. Empty also has different >>>>> meaning in different code paths; for most it means “null”, and for some >>>>> other types it means “empty”…. To try to make things more clear I added >>>>> org.apache.cassandra.db.marshal.AbstractType#isNull(V, >>>>> org.apache.cassandra.db.marshal.ValueAccessor<V>) to the type system so >>>>> each type can define if empty is null or not. >>>>> >>>>>> I also think that it would be good to standardize on one approach to >>>>>> avoid confusion. >>>>> >>>>> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. >>>>> Let’s say I have a “blob” type and I write an empty byte… what does this >>>>> mean? What does it mean for "text" type? The fact I get back a null in >>>>> both those cases was very confusing to me… I do feel that some types >>>>> should support empty, and the common code of empty == null I think is >>>>> very brittle (blob/text was not correct in different places due to >>>>> this...)… so I am cool with removing that relationship, but don’t think >>>>> we should have a rule blocking empty for all current / future types as it >>>>> some times does make sense. >>>>> >>>>>> empty vector (I presume) for the vector type? >>>>> >>>>> Empty vectors (vector[0]) are blocked at the type level, the smallest >>>>> vector is vector[1] >>>>> >>>>>> as types that can never be null >>>>> >>>>> One pro here is that “null” is cheaper (in some regards) than delete >>>>> (though we can never purge), but having 2 similar behaviors (write null, >>>>> do a delete) at the type level is a bit confusing… Right now I am allowed >>>>> to do the following (the below isn’t valid CQL, its a hybrid of CQL + >>>>> Java code…) >>>>> >>>>> CREATE TABLE fluffykittens (pk int primary key, cuteness int); >>>>> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0]) >>>>> >>>>> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY >>>>> ((pk1, pk2)); >>>>> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new >>>>> byte[0], new byte[0]) — valid as the partition key is not empty as its a >>>>> composite of 2 empty values, this is the same as new byte[2] >>>>> >>>>> The first time I ever found out that empty bytes was valid was when a >>>>> user was trying to abuse this in collections (also the fact collections >>>>> support null in some cases and not others is fun…)…. It was blowing up in >>>>> random places… good times! >>>>> >>>>> I am personally not in favor of allowing empty bytes (other than for blob >>>>> / text as that is actually valid for the domain), but having similar >>>>> types having different semantics I feel is more problematic... >>>>> >>>>>> On Sep 19, 2023, at 8:56 AM, Josh McKenzie <jmcken...@apache.org> wrote: >>>>>> >>>>>>> I am strongly in favour of permitting the table definition forbidding >>>>>>> nulls - and perhaps even defaulting to this behaviour. But I don’t >>>>>>> think we should have types that are inherently incapable of being null. >>>>>> I'm with Benedict. Seems like this could help prevent whatever "nulls in >>>>>> primary key columns" problems Aleksey was alluding to on those tickets >>>>>> back in the day that pushed us towards making the new types >>>>>> non-emptiable as well (i.e. primary keys are non-null in table >>>>>> definition). >>>>>> >>>>>> Furthering Alex' question, having a default value for unset fields in >>>>>> any non-collection context seems... quite surprising to me in a >>>>>> database. I could see the argument for making container / collection >>>>>> types non-nullable, maybe, but that just keeps us in a potential >>>>>> straddle case (some types nullable, some not). >>>>>> >>>>>> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote: >>>>>>> >>>>>>> If I understand this suggestion correctly it is a whole can of worms, >>>>>>> as types that can never be null prevent us ever supporting outer joins >>>>>>> that return these types. >>>>>>> >>>>>>> I am strongly in favour of permitting the table definition forbidding >>>>>>> nulls - and perhaps even defaulting to this behaviour. But I don’t >>>>>>> think we should have types that are inherently incapable of being null. >>>>>>> I also certainly don’t think we should have bifurcated our behaviour >>>>>>> between types like this. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 19 Sep 2023, at 11:54, Alex Petrov <al...@coffeenco.de> wrote: >>>>>>>> >>>>>>>> To make sure I understand this right; does that mean there will be a >>>>>>>> default value for unset fields? Like 0 for numerical values, and an >>>>>>>> empty vector (I presume) for the vector type? >>>>>>>> >>>>>>>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote: >>>>>>>>> Hi everybody, >>>>>>>>> >>>>>>>>> I noticed that the new Vector type accepts empty ByteBuffer values as >>>>>>>>> an input representing null. >>>>>>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started >>>>>>>>> making types non -emptiable. This approach makes more sense to me as >>>>>>>>> having to deal with empty value is error prone in my opinion. >>>>>>>>> I also think that it would be good to standardize on one approach to >>>>>>>>> avoid confusion. >>>>>>>>> >>>>>>>>> Should we make the Vector type non-emptiable and stick to it for the >>>>>>>>> new types? >>>>>>>>> >>>>>>>>> I like to hear your opinion. >>>>> >>>>> >>> >> >