My planet was destroyed to make way for hyperspatial express route before I could get the answer, sorry… anyone interested in getting lunch at the end of the universe?
> On Feb 11, 2025, at 2:03 PM, Paulo Motta <pa...@apache.org> wrote: > > On Tue, Feb 11, 2025 at 5:00 PM Patrick McFadin <pmcfa...@gmail.com> wrote: >> >> You get my vote for the best subject line I've seen this week. >> > > +1, I'm deeply saddened that despite this title this post does not > contain the answer to the ultimate question of life, the universe, and > everything. :-( > > >> On Tue, Feb 11, 2025 at 1:20 PM Benedict <bened...@apache.org> wrote: >>> >>> Perhaps we should reify this in the type system? Introduce a MAYBE EMPTY >>> modifier for column types, or simply name them eg [int] vs int? >>> >>> I think the problem is today it’s all implicit and - as David says - >>> inconsistent. It would be nice to move away from this as the default for a >>> variety of reasons, but also nice to make the behaviour well defined for >>> those use cases we think we’re supporting. >>> >>> On 11 Feb 2025, at 21:16, David Capwell <dcapw...@apple.com> wrote: >>> >>> Thanks for the reply! >>> >>> AFAIK this EMPTY stuff goes back to thrift days. >>> >>> >>> This is what I was told, but the expected semantics are not clear so my >>> goal is to help flesh things out. >>> >>> We let people insert these zero length values back then, so we have to >>> support those zero length values existing for ever :/. >>> >>> >>> We allow this for some types but not all. I think where I am coming from >>> is write != select, so if we say empty = “no value” or “null” then why does >>> select treat it as a value? Is this the expected behavior? >>> >>> maybe we should be converting such values into a tombstone / NULL >>> >>> >>> Tombstones can be purged, where as empty can’t, so should it? >>> >>> Everything besides SAI, including the table based 2i and ALLOW FILTERING, >>> treat EMPTY as a distant value which can be inserted and queried on. We >>> have supported it in the past, so we should continue to support it into the >>> future, even if it is painful to do. >>> >>> >>> I guess where I come from here is what semantics do we expect. >>> >>> So lets say v0 is empty bytes int >>> >>> SELECT CAST(v0 AS text) >>> >>> Is this null or empty bytes? In our project this is null >>> >>> SELECT JSON v0 >>> >>> Is this null or empty bytes? In our project this is null >>> >>> SELECT avg(v0) … >>> >>> Is this null or empty bytes? In our project this is null >>> >>> So in most places you touch empty bytes we treat it as null, but only in >>> filtering do we not. >>> >>> On Feb 11, 2025, at 11:27 AM, Jeremiah Jordan <jeremiah.jor...@gmail.com> >>> wrote: >>> >>> AFAIK this EMPTY stuff goes back to thrift days. We let people insert >>> these zero length values back then, so we have to support those zero length >>> values existing for ever :/. >>> >>> How useful is such a distinction? I don’t know. Is anybody actually doing >>> this? Well Andres brought up >>> https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because >>> we had an end user create an SAI index on a column which contained EMPTY >>> values in it. So people are inserting these into the database. Would they >>> expect to be able to query by EMPTY? I do not know. >>> >>> This is the first I have heard of the “isEmptyValueMeaningless” setting. >>> The meaning of EMPTY to me has always been the same for an Integer or a >>> String, “this column has a value of no value” vs NULL which means "this >>> column is not set/has no value”. If we truly want to follow the spirit of >>> that setting, then maybe we should be converting such values into a >>> tombstone / NULL up front when deserializing them, rather than storing the >>> EMPTY byte buffer in the DB? >>> >>> Anyway, I am kind of rambling here. I am of two minds. >>> I can see that this does seem like a silly distinction to have for some >>> types, so maybe we should just decide that in a CQL world, EMPTY means NULL >>> for some types, and actually just make that a tombstone. Maybe 6.0 would >>> be a good major version change to make such a “breaking” behavior change in. >>> >>> I can also see the “don’t screw up the legacy apps” use case. Everything >>> besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY >>> as a distant value which can be inserted and queried on. We have supported >>> it in the past, so we should continue to support it into the future, even >>> if it is painful to do. >>> >>> Flip a coin and I can argue either side. So I would love to hear others >>> thoughts to convince me one way to the other. >>> >>> -Jeremiah >>> >>> >>> >>> On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe <calebrackli...@gmail.com> >>> wrote: >>>> >>>> The case where allowsEmpty == true AND is meaningless == true is >>>> especially confusing. If I could design this from scratch, I would reject >>>> writes and filtering on EMPTY values for int and the other types where >>>> meaningless == true. (In other words, if we allow EMPTY, it is meaningful >>>> and queryable. If we don't, it isn't.) That avoids problems that can't >>>> have anything other than an arbitrary solution, like what we do with < and >>>> > for EMPTY for int. When we add IS [NOT] NULL support, that would >>>> preferably NOT match EMPTY values for the types where empty means >>>> something, like strings. For everything else, EMPTY could be equivalent to >>>> null and match IS NULL. >>>> >>>> The only real way to make SAI compatible with the current behavior is to >>>> add something like a special postings list to its data structures that >>>> corresponds to the rows where the indexed column value is EMPTY. >>>> >>>> On Tue, Feb 11, 2025 at 12:21 PM David Capwell <dcapw...@apple.com> wrote: >>>>> >>>>> Bringing this discussion to dev@ rather than Slack as we try to figure >>>>> out CASSANDRA-20313 and CASSANDRA-19461. >>>>> >>>>> In the type system, we have 2 different (but related) methods: >>>>> >>>>> AbstractType#allowsEmpty - if the user gives empty >>>>> bytes (new byte[0]) will the type reject it >>>>> AbstractType#isEmptyValueMeaningless - if the user gives empty bytes, >>>>> should this be handled like null? >>>>> >>>>> In practice, there are 2 cases that matter: >>>>> >>>>> allowsEmpty = true AND is meaningless = false - stuff like text and bytes >>>>> allowsEmpty = true AND is meaningless = true - many types, example "int" >>>>> >>>>> What this means is that users are able to use empty bytes when writing to >>>>> these types, but this leads to complexity in the filter path, and is >>>>> something we are trying to flesh out the “correct” semantics for SAI. >>>>> >>>>> Simple example: >>>>> >>>>> {code} >>>>> >>>>> @Test >>>>> public void test() throws IOException >>>>> { >>>>> try (Cluster cluster = Cluster.build(1).start()) >>>>> { >>>>> init(cluster); >>>>> cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (pk int >>>>> primary key, v int)")); >>>>> IInvokableInstance node = cluster.get(1); >>>>> for (int i = 0; i < 10; i++) >>>>> node.executeInternal(withKeyspace("INSERT INTO %s.tbl (pk, v) >>>>> VALUES (?, ?)"), i, ByteBufferUtil.EMPTY_BYTE_BUFFER); >>>>> >>>>> var qr = node.executeInternalWithResult(withKeyspace("SELECT * >>>>> FROM %s.tbl WHERE v=? ALLOW FILTERING"), >>>>> ByteBufferUtil.EMPTY_BYTE_BUFFER); >>>>> StringBuilder sb = new StringBuilder(); >>>>> sb.append(qr.names()); >>>>> while (qr.hasNext()) >>>>> { >>>>> var next = qr.next(); >>>>> sb.append('\n').append(next); >>>>> } >>>>> System.out.println(sb); >>>>> } >>>>> } >>>>> >>>>> {code} >>>>> >>>>> “Should” this return 10 rows or 0? In this case, the type is int, and >>>>> int defines empty as meaningless, which means it should act as a null; >>>>> yet this query returns 10 rows, which violates CQL as foo = null == false. >>>>> >>>>> Right now there really isn’t a way to query for NULL (CASSANDRA-10715 is >>>>> still open), but if we did add such a thing we would also need to figure >>>>> out the semantics with regard to these cases. >>> >>>