Thanks for the reply Benjamin, makes sense to me. We can always add it later if it makes sense later, don’t need now in UPDATE
> On May 15, 2024, at 7:44 AM, Jon Haddad <j...@jonhaddad.com> wrote: > > I was trying to have a discussion about a technical possibility, not a cost > benefit analysis. More of a "how could we technically reach mars?" > discussion than a "how we get congress to authorize a budget to reach mars?" > > Happy to talk about this privately with anyone interested as I enjoy a > technical discussion for the sake of a good technical discussion. > > Thanks, > Jon > > On Wed, May 15, 2024 at 7:18 AM Josh McKenzie <jmcken...@apache.org> wrote: >> Is there a technical limitation that would prevent a range write that >> functions the same way as a range tombstone, other than probably needing a >> version bump of the storage format? > The technical limitation would be cost/benefit due to how this intersects > w/our architecture I think. > > Range tombstones have taught us that something that should be relatively > simple (merge in deletion mask at read time) introduces a significant amount > of complexity on all the paths Benjamin enumerated with a pretty long tail of > bugs and data incorrectness issues and edge cases. The work to get there, at > a high level glance, would be: > • Updates to CQL grammar, spec > • Updates to write path > • Updates to accord. And thinking about how this intersects w/accord's > WAL / logic (I think? Consider me not well educated on details here) > • Updates to compaction w/consideration for edge cases on all the > different compaction strategies > • Updates to iteration and merge logic > • Updates to paging logic > • Indexing > • repair, both full and incremental implications, support, etc > • the list probably goes on? There's always >= 1 thing we're not thinking > of with a change like this. Usually more. > For all of the above we also would need unit, integration, and fuzz testing > extensively to ensure the introduction of this new spanning concept on a > write doesn't introduce edge cases where incorrect data is returned on merge. > > All of which is to say: it's an interesting problem, but IMO given our > architecture and what we know about the past of trying to introduce an > architectural concept like this, the costs to getting something like this to > production ready are pretty high. > > To me the cost/benefit don't really balance out. Just my .02 though. > > On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote: >> It would be a lot more constructive to apply our brains towards solving an >> interesting problem than pointing out all its potential flaws based on gut >> feelings. >> >> It is not simply a gut feeling, Jon. This change impacts read, write, >> indexing, storage, compaction, repair... The risk and cost associated with >> it are pretty significant and I am not convinced at this point of its >> benefit. >> >> Le mar. 14 mai 2024 à 19:05, Jon Haddad <j...@jonhaddad.com> a écrit : >> Personally, I don't think that something being scary at first glance is a >> good reason not to explore an idea. The scenario you've described here is >> tricky but I'm not expecting it to be any worse than say, SAI, which (the >> last I checked) has O(N) complexity on returning result sets with regard to >> rows returned. We've also merged in Vector search which has O(N) overhead >> with the number of SSTables. We're still fundamentally looking at, in most >> cases, a limited number of SSTables and some merging of values. >> >> Write updates are essentially a timestamped mask, potentially overlapping, >> and I suspect potentially resolvable during compaction by propagating the >> values. They could be eliminated or narrowed based on how they've >> propagated by using the timestamp metadata on the SSTable. >> >> It would be a lot more constructive to apply our brains towards solving an >> interesting problem than pointing out all its potential flaws based on gut >> feelings. We haven't even moved this past an idea. >> >> I think it would solve a massive problem for a lot of people and is 100% >> worth considering. Thanks Patrick and David for raising this. >> >> Jon >> >> >> >> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev >> <dev@cassandra.apache.org> wrote: >> >> Ranged update sounds like a disaster for compaction and read performance. >> Imagine compacting or reading some SSTables in which a large number of >> overlapping but non-identical ranges were updated with different values. It >> gives me a headache by just thinking about it. >> Ranged delete is much simpler, because the "value" is the same tombstone >> marker, and it also is guaranteed to expire and disappear eventually, so the >> performance impact of dealing with them at read and compaction time doesn't >> suffer in the long term. >> >> On 14/05/2024 16:59, Benjamin Lerer wrote: >>> It should be like range tombstones ... in much worse ;-). A tombstone is a >>> simple marker (deleted). An update can be far more complex. >>> >>> Le mar. 14 mai 2024 à 15:52, Jon Haddad <j...@jonhaddad.com> a écrit : >>> Is there a technical limitation that would prevent a range write that >>> functions the same way as a range tombstone, other than probably needing a >>> version bump of the storage format? >>> >>> >>> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer <ble...@apache.org> wrote: >>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They >>> do work on DELETE because under the hood C* they get translated into range >>> tombstones. >>> >>> Le mar. 14 mai 2024 à 02:44, David Capwell <dcapw...@apple.com> a écrit : >>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work. >>> >>>> On May 13, 2024, at 7:40 AM, Patrick McFadin <pmcfa...@gmail.com> wrote: >>>> >>>> This is a great feature addition to CQL! I get asked about it from time to >>>> time but then people figure out a workaround. It will be great to just >>>> have it available. >>>> >>>> And right on Simon! I think the only project I had as a high school senior >>>> was figuring out how many parties I could go to and still maintain a >>>> passing grade. Thanks for your work here. >>>> >>>> Patrick >>>> >>>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer <ble...@apache.org> wrote: >>>> Hi everybody, >>>> >>>> Just raising awareness that Simon is working on adding support for the >>>> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604. >>>> We plan to add support for it in conditions in a separate patch. >>>> >>>> The patch is available. >>>> >>>> As a side note, Simon chose to do his highschool senior project >>>> contributing to Apache Cassandra. This patch is his first contribution for >>>> his senior project (his second feature contribution to Apache Cassandra). >>>> >>>> >