I was trying to have a discussion about a technical possibility, not a cost
benefit analysis.  More of a "how could we technically reach mars?"
discussion than a "how we get congress to authorize a budget to reach mars?"

Happy to talk about this privately with anyone interested as I enjoy a
technical discussion for the sake of a good technical discussion.

Thanks,
Jon

On Wed, May 15, 2024 at 7:18 AM Josh McKenzie <jmcken...@apache.org> wrote:

> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> version bump of the storage format?
>
> The technical limitation would be cost/benefit due to how this intersects
> w/our architecture I think.
>
> Range tombstones have taught us that something that should be relatively
> simple (merge in deletion mask at read time) introduces a significant
> amount of complexity on all the paths Benjamin enumerated with a pretty
> long tail of bugs and data incorrectness issues and edge cases. The work to
> get there, at a high level glance, would be:
>
>    1. Updates to CQL grammar, spec
>    2. Updates to write path
>    3. Updates to accord. And thinking about how this intersects
>    w/accord's WAL / logic (I think? Consider me not well educated on details
>    here)
>    4. Updates to compaction w/consideration for edge cases on all the
>    different compaction strategies
>    5. Updates to iteration and merge logic
>    6. Updates to paging logic
>    7. Indexing
>    8. repair, both full and incremental implications, support, etc
>    9. the list probably goes on? There's always >= 1 thing we're not
>    thinking of with a change like this. Usually more.
>
> For all of the above we also would need unit, integration, and fuzz
> testing extensively to ensure the introduction of this new spanning concept
> on a write doesn't introduce edge cases where incorrect data is returned on
> merge.
>
> All of which is to say: it's an interesting problem, but IMO given our
> architecture and what we know about the past of trying to introduce an
> architectural concept like this, the costs to getting something like this
> to production ready are pretty high.
>
> To me the cost/benefit don't really balance out. Just my .02 though.
>
> On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote:
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.
>
>
> It is not simply a gut feeling, Jon. This change impacts read, write,
> indexing, storage, compaction, repair... The risk and cost associated with
> it are pretty significant and I am not convinced at this point of its
> benefit.
>
> Le mar. 14 mai 2024 à 19:05, Jon Haddad <j...@jonhaddad.com> a écrit :
>
> Personally, I don't think that something being scary at first glance is a
> good reason not to explore an idea.  The scenario you've described here is
> tricky but I'm not expecting it to be any worse than say, SAI, which (the
> last I checked) has O(N) complexity on returning result sets with regard to
> rows returned.  We've also merged in Vector search which has O(N) overhead
> with the number of SSTables.  We're still fundamentally looking at, in most
> cases, a limited number of SSTables and some merging of values.
>
> Write updates are essentially a timestamped mask, potentially overlapping,
> and I suspect potentially resolvable during compaction by propagating the
> values.  They could be eliminated or narrowed based on how they've
> propagated by using the timestamp metadata on the SSTable.
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.  We haven't even moved this past an idea.
>
> I think it would solve a massive problem for a lot of people and is 100%
> worth considering.  Thanks Patrick and David for raising this.
>
> Jon
>
>
>
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
>
> Ranged update sounds like a disaster for compaction and read performance.
>
> Imagine compacting or reading some SSTables in which a large number of
> overlapping but non-identical ranges were updated with different values. It
> gives me a headache by just thinking about it.
>
> Ranged delete is much simpler, because the "value" is the same tombstone
> marker, and it also is guaranteed to expire and disappear eventually, so
> the performance impact of dealing with them at read and compaction time
> doesn't suffer in the long term.
>
> On 14/05/2024 16:59, Benjamin Lerer wrote:
>
> It should be like range tombstones ... in much worse ;-). A tombstone is a
> simple marker (deleted). An update can be far more complex.
>
> Le mar. 14 mai 2024 à 15:52, Jon Haddad <j...@jonhaddad.com> a écrit :
>
> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> version bump of the storage format?
>
>
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer <ble...@apache.org> wrote:
>
> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They
> do work on DELETE because under the hood C* they get translated into range
> tombstones.
>
> Le mar. 14 mai 2024 à 02:44, David Capwell <dcapw...@apple.com> a écrit :
>
> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work.
>
> On May 13, 2024, at 7:40 AM, Patrick McFadin <pmcfa...@gmail.com> wrote:
>
> This is a great feature addition to CQL! I get asked about it from time to
> time but then people figure out a workaround. It will be great to just have
> it available.
>
> And right on Simon! I think the only project I had as a high school senior
> was figuring out how many parties I could go to and still maintain a
> passing grade. Thanks for your work here.
>
> Patrick
>
> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer <ble...@apache.org> wrote:
>
> Hi everybody,
>
> Just raising awareness that Simon is working on adding support for the
> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
> We plan to add support for it in conditions in a separate patch.
>
> The patch is available.
>
> As a side note, Simon chose to do his highschool senior project
> contributing to Apache Cassandra. This patch is his first contribution for
> his senior project (his second feature contribution to Apache Cassandra).
>
>
>
>

Reply via email to