> So, not for new types. > Should we make the Vector type non-emptiable and stick to it for the new > types?
Yep, works for me. We should also update the test org.apache.cassandra.db.marshal.AbstractTypeTest#empty to detect this for new types by making org.apache.cassandra.db.marshal.AbstractType#allowsEmpty default to false and override in all legacy types More than glad to review any patch to fix this issue! > On Sep 20, 2023, at 7:16 AM, Aleksey Yeshchenko <alek...@apple.com> wrote: > > Allowing zero-length byte arrays for most old types is just a legacy from > Darker Days. It’s a distinct concern from columns being nullable or not. > > There are a couple types where this makes sense: strings and blobs. All else > should not allow this except for backward compatibility reasons. So, not for > new types. > >> On 20 Sep 2023, at 00:08, David Capwell <dcapw...@apple.com> wrote: >> >>> When does empty mean null? >> >> >> Most types are this way >> >> @Test >> public void nullExample() >> { >> createTable("CREATE TABLE %s (pk int primary key, cuteness int)"); >> execute("INSERT INTO %s (pk, cuteness) VALUES (0, ?)", ByteBuffer.wrap(new >> byte[0])); >> Row result = execute("SELECT * FROM %s WHERE pk=0").one(); >> if (result.has("cuteness")) System.out.println("Cuteness score: " + >> result.getInt("cuteness")); >> else System.out.println("Cuteness score is undefined"); >> } >> >> >> This test will NPE in getInt as the returned BB is seen as “null” for int32 >> type, you can make it “safer” by changing to the following >> >> if (result.has("cuteness")) System.out.println("Cuteness score: " + >> Int32Type.instance.compose(result.getBlob("cuteness"))); >> >> Now we get the log "Cuteness score: null” >> >> What’s even better (just found this out) is that client isn’t consistent or >> correct in these cases! >> >> com.datastax.driver.core.Row result = executeNet(ProtocolVersion.CURRENT, >> "SELECT * FROM %s WHERE pk=0").one(); >> if (result.getBytesUnsafe("cuteness") != null) System.out.println("Cuteness >> score: " + result.getInt("cuteness")); >> else System.out.println("Cuteness score is undefined”); >> >> This prints "Cuteness score: 0” >> >> So for Cassandra we think the value is “null” but java driver thinks it’s 0? >> >>> Do we have types where writing an empty value creates a tombstone? >> >> Empty does not generate a tombstone for any type, but empty has a similar >> user experience as we return null in both cases (but just found out that the >> drivers may not be consistent with this…) >> >>> On Sep 19, 2023, at 3:33 PM, J. D. Jordan <jeremiah.jor...@gmail.com> wrote: >>> >>> When does empty mean null? My understanding was that empty is a valid >>> value for the types that support it, separate from null (aka a tombstone). >>> Do we have types where writing an empty value creates a tombstone? >>> >>> I agree with David that my preference would be for only blob and string >>> like types to support empty. It’s too late for the existing types, but we >>> should hold to this going forward. Which is what I think the idea was in >>> https://issues.apache.org/jira/browse/CASSANDRA-8951 as well? That it was >>> sad the existing numerics were emptiable, but too late to change, and we >>> could correct it for newer types. >>> >>>> On Sep 19, 2023, at 12:12 PM, David Capwell <dcapw...@apple.com> wrote: >>>> >>>> >>>>> >>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-8951) we started >>>>> making types non -emptiable. This approach makes more sense to me as >>>>> having to deal with empty value is error prone in my opinion. >>>> >>>> I agree it’s confusing, and in the patch I found that different code paths >>>> didn’t handle things correctly as we have some times (most) that support >>>> empty bytes, and some that do not…. Empty also has different meaning in >>>> different code paths; for most it means “null”, and for some other types >>>> it means “empty”…. To try to make things more clear I added >>>> org.apache.cassandra.db.marshal.AbstractType#isNull(V, >>>> org.apache.cassandra.db.marshal.ValueAccessor<V>) to the type system so >>>> each type can define if empty is null or not. >>>> >>>>> I also think that it would be good to standardize on one approach to >>>>> avoid confusion. >>>> >>>> I agree, but also don’t feel it’s a perfect one-size-fits-all thing…. >>>> Let’s say I have a “blob” type and I write an empty byte… what does this >>>> mean? What does it mean for "text" type? The fact I get back a null in >>>> both those cases was very confusing to me… I do feel that some types >>>> should support empty, and the common code of empty == null I think is very >>>> brittle (blob/text was not correct in different places due to this...)… so >>>> I am cool with removing that relationship, but don’t think we should have >>>> a rule blocking empty for all current / future types as it some times does >>>> make sense. >>>> >>>>> empty vector (I presume) for the vector type? >>>> >>>> Empty vectors (vector[0]) are blocked at the type level, the smallest >>>> vector is vector[1] >>>> >>>>> as types that can never be null >>>> >>>> One pro here is that “null” is cheaper (in some regards) than delete >>>> (though we can never purge), but having 2 similar behaviors (write null, >>>> do a delete) at the type level is a bit confusing… Right now I am allowed >>>> to do the following (the below isn’t valid CQL, its a hybrid of CQL + Java >>>> code…) >>>> >>>> CREATE TABLE fluffykittens (pk int primary key, cuteness int); >>>> INSERT INTO fluffykittens (pk, cuteness) VALUES (0, new byte[0]) >>>> >>>> CREATE TABLE typesarehard (pk1 int, pk2 int, cuteness int, PRIMARY KEY >>>> ((pk1, pk2)); >>>> INSERT INTO typesarehard (pk1, pk2, cuteness) VALUES (new byte[0], new >>>> byte[0], new byte[0]) — valid as the partition key is not empty as its a >>>> composite of 2 empty values, this is the same as new byte[2] >>>> >>>> The first time I ever found out that empty bytes was valid was when a user >>>> was trying to abuse this in collections (also the fact collections support >>>> null in some cases and not others is fun…)…. It was blowing up in random >>>> places… good times! >>>> >>>> I am personally not in favor of allowing empty bytes (other than for blob >>>> / text as that is actually valid for the domain), but having similar types >>>> having different semantics I feel is more problematic... >>>> >>>>> On Sep 19, 2023, at 8:56 AM, Josh McKenzie <jmcken...@apache.org> wrote: >>>>> >>>>>> I am strongly in favour of permitting the table definition forbidding >>>>>> nulls - and perhaps even defaulting to this behaviour. But I don’t think >>>>>> we should have types that are inherently incapable of being null. >>>>> I'm with Benedict. Seems like this could help prevent whatever "nulls in >>>>> primary key columns" problems Aleksey was alluding to on those tickets >>>>> back in the day that pushed us towards making the new types non-emptiable >>>>> as well (i.e. primary keys are non-null in table definition). >>>>> >>>>> Furthering Alex' question, having a default value for unset fields in any >>>>> non-collection context seems... quite surprising to me in a database. I >>>>> could see the argument for making container / collection types >>>>> non-nullable, maybe, but that just keeps us in a potential straddle case >>>>> (some types nullable, some not). >>>>> >>>>> On Tue, Sep 19, 2023, at 8:22 AM, Benedict wrote: >>>>>> >>>>>> If I understand this suggestion correctly it is a whole can of worms, as >>>>>> types that can never be null prevent us ever supporting outer joins that >>>>>> return these types. >>>>>> >>>>>> I am strongly in favour of permitting the table definition forbidding >>>>>> nulls - and perhaps even defaulting to this behaviour. But I don’t think >>>>>> we should have types that are inherently incapable of being null. I also >>>>>> certainly don’t think we should have bifurcated our behaviour between >>>>>> types like this. >>>>>> >>>>>> >>>>>> >>>>>>> On 19 Sep 2023, at 11:54, Alex Petrov <al...@coffeenco.de> wrote: >>>>>>> >>>>>>> To make sure I understand this right; does that mean there will be a >>>>>>> default value for unset fields? Like 0 for numerical values, and an >>>>>>> empty vector (I presume) for the vector type? >>>>>>> >>>>>>> On Fri, Sep 15, 2023, at 11:46 AM, Benjamin Lerer wrote: >>>>>>>> Hi everybody, >>>>>>>> >>>>>>>> I noticed that the new Vector type accepts empty ByteBuffer values as >>>>>>>> an input representing null. >>>>>>>> When we introduced TINYINT and SMALLINT (CASSANDRA-895) we started >>>>>>>> making types non -emptiable. This approach makes more sense to me as >>>>>>>> having to deal with empty value is error prone in my opinion. >>>>>>>> I also think that it would be good to standardize on one approach to >>>>>>>> avoid confusion. >>>>>>>> >>>>>>>> Should we make the Vector type non-emptiable and stick to it for the >>>>>>>> new types? >>>>>>>> >>>>>>>> I like to hear your opinion. >>>> >>>> >> >