I was trying to have a discussion about a technical possibility, not a cost benefit analysis. More of a "how could we technically reach mars?" discussion than a "how we get congress to authorize a budget to reach mars?"
Happy to talk about this privately with anyone interested as I enjoy a technical discussion for the sake of a good technical discussion. Thanks, Jon On Wed, May 15, 2024 at 7:18 AM Josh McKenzie <jmcken...@apache.org> wrote: > Is there a technical limitation that would prevent a range write that > functions the same way as a range tombstone, other than probably needing a > version bump of the storage format? > > The technical limitation would be cost/benefit due to how this intersects > w/our architecture I think. > > Range tombstones have taught us that something that should be relatively > simple (merge in deletion mask at read time) introduces a significant > amount of complexity on all the paths Benjamin enumerated with a pretty > long tail of bugs and data incorrectness issues and edge cases. The work to > get there, at a high level glance, would be: > > 1. Updates to CQL grammar, spec > 2. Updates to write path > 3. Updates to accord. And thinking about how this intersects > w/accord's WAL / logic (I think? Consider me not well educated on details > here) > 4. Updates to compaction w/consideration for edge cases on all the > different compaction strategies > 5. Updates to iteration and merge logic > 6. Updates to paging logic > 7. Indexing > 8. repair, both full and incremental implications, support, etc > 9. the list probably goes on? There's always >= 1 thing we're not > thinking of with a change like this. Usually more. > > For all of the above we also would need unit, integration, and fuzz > testing extensively to ensure the introduction of this new spanning concept > on a write doesn't introduce edge cases where incorrect data is returned on > merge. > > All of which is to say: it's an interesting problem, but IMO given our > architecture and what we know about the past of trying to introduce an > architectural concept like this, the costs to getting something like this > to production ready are pretty high. > > To me the cost/benefit don't really balance out. Just my .02 though. > > On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote: > > It would be a lot more constructive to apply our brains towards solving an > interesting problem than pointing out all its potential flaws based on gut > feelings. > > > It is not simply a gut feeling, Jon. This change impacts read, write, > indexing, storage, compaction, repair... The risk and cost associated with > it are pretty significant and I am not convinced at this point of its > benefit. > > Le mar. 14 mai 2024 à 19:05, Jon Haddad <j...@jonhaddad.com> a écrit : > > Personally, I don't think that something being scary at first glance is a > good reason not to explore an idea. The scenario you've described here is > tricky but I'm not expecting it to be any worse than say, SAI, which (the > last I checked) has O(N) complexity on returning result sets with regard to > rows returned. We've also merged in Vector search which has O(N) overhead > with the number of SSTables. We're still fundamentally looking at, in most > cases, a limited number of SSTables and some merging of values. > > Write updates are essentially a timestamped mask, potentially overlapping, > and I suspect potentially resolvable during compaction by propagating the > values. They could be eliminated or narrowed based on how they've > propagated by using the timestamp metadata on the SSTable. > > It would be a lot more constructive to apply our brains towards solving an > interesting problem than pointing out all its potential flaws based on gut > feelings. We haven't even moved this past an idea. > > I think it would solve a massive problem for a lot of people and is 100% > worth considering. Thanks Patrick and David for raising this. > > Jon > > > > On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev < > dev@cassandra.apache.org> wrote: > > > Ranged update sounds like a disaster for compaction and read performance. > > Imagine compacting or reading some SSTables in which a large number of > overlapping but non-identical ranges were updated with different values. It > gives me a headache by just thinking about it. > > Ranged delete is much simpler, because the "value" is the same tombstone > marker, and it also is guaranteed to expire and disappear eventually, so > the performance impact of dealing with them at read and compaction time > doesn't suffer in the long term. > > On 14/05/2024 16:59, Benjamin Lerer wrote: > > It should be like range tombstones ... in much worse ;-). A tombstone is a > simple marker (deleted). An update can be far more complex. > > Le mar. 14 mai 2024 à 15:52, Jon Haddad <j...@jonhaddad.com> a écrit : > > Is there a technical limitation that would prevent a range write that > functions the same way as a range tombstone, other than probably needing a > version bump of the storage format? > > > On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer <ble...@apache.org> wrote: > > Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They > do work on DELETE because under the hood C* they get translated into range > tombstones. > > Le mar. 14 mai 2024 à 02:44, David Capwell <dcapw...@apple.com> a écrit : > > I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work. > > On May 13, 2024, at 7:40 AM, Patrick McFadin <pmcfa...@gmail.com> wrote: > > This is a great feature addition to CQL! I get asked about it from time to > time but then people figure out a workaround. It will be great to just have > it available. > > And right on Simon! I think the only project I had as a high school senior > was figuring out how many parties I could go to and still maintain a > passing grade. Thanks for your work here. > > Patrick > > On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer <ble...@apache.org> wrote: > > Hi everybody, > > Just raising awareness that Simon is working on adding support for the > BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604. > We plan to add support for it in conditions in a separate patch. > > The patch is available. > > As a side note, Simon chose to do his highschool senior project > contributing to Apache Cassandra. This patch is his first contribution for > his senior project (his second feature contribution to Apache Cassandra). > > > >