Re: [DISCUSS] Adding support for BETWEEN operator

David Capwell Wed, 15 May 2024 08:52:15 -0700

Thanks for the reply Benjamin, makes sense to me.  We can always add it later 
if it makes sense later, don’t need now in UPDATE


> On May 15, 2024, at 7:44 AM, Jon Haddad <j...@jonhaddad.com> wrote:
> 
> I was trying to have a discussion about a technical possibility, not a cost 
> benefit analysis.  More of a "how could we technically reach mars?" 
> discussion than a "how we get congress to authorize a budget to reach mars?"
> 
> Happy to talk about this privately with anyone interested as I enjoy a 
> technical discussion for the sake of a good technical discussion.
> 
> Thanks,
> Jon
> 
> On Wed, May 15, 2024 at 7:18 AM Josh McKenzie <jmcken...@apache.org> wrote:
>> Is there a technical limitation that would prevent a range write that 
>> functions the same way as a range tombstone, other than probably needing a 
>> version bump of the storage format?
> The technical limitation would be cost/benefit due to how this intersects 
> w/our architecture I think.
> 
> Range tombstones have taught us that something that should be relatively 
> simple (merge in deletion mask at read time) introduces a significant amount 
> of complexity on all the paths Benjamin enumerated with a pretty long tail of 
> bugs and data incorrectness issues and edge cases. The work to get there, at 
> a high level glance, would be:
>     • Updates to CQL grammar, spec
>     • Updates to write path
>     • Updates to accord. And thinking about how this intersects w/accord's 
> WAL / logic (I think? Consider me not well educated on details here)
>     • Updates to compaction w/consideration for edge cases on all the 
> different compaction strategies
>     • Updates to iteration and merge logic
>     • Updates to paging logic
>     • Indexing
>     • repair, both full and incremental implications, support, etc
>     • the list probably goes on? There's always >= 1 thing we're not thinking 
> of with a change like this. Usually more.
> For all of the above we also would need unit, integration, and fuzz testing 
> extensively to ensure the introduction of this new spanning concept on a 
> write doesn't introduce edge cases where incorrect data is returned on merge.
> 
> All of which is to say: it's an interesting problem, but IMO given our 
> architecture and what we know about the past of trying to introduce an 
> architectural concept like this, the costs to getting something like this to 
> production ready are pretty high.
> 
> To me the cost/benefit don't really balance out. Just my .02 though.
> 
> On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote:
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.
>> 
>> It is not simply a gut feeling, Jon. This change impacts read, write, 
>> indexing, storage, compaction, repair... The risk and cost associated with 
>> it are pretty significant and I am not convinced at this point of its 
>> benefit.
>> 
>> Le mar. 14 mai 2024 à 19:05, Jon Haddad <j...@jonhaddad.com> a écrit :
>> Personally, I don't think that something being scary at first glance is a 
>> good reason not to explore an idea.  The scenario you've described here is 
>> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
>> last I checked) has O(N) complexity on returning result sets with regard to 
>> rows returned.  We've also merged in Vector search which has O(N) overhead 
>> with the number of SSTables.  We're still fundamentally looking at, in most 
>> cases, a limited number of SSTables and some merging of values.
>> 
>> Write updates are essentially a timestamped mask, potentially overlapping, 
>> and I suspect potentially resolvable during compaction by propagating the 
>> values.  They could be eliminated or narrowed based on how they've 
>> propagated by using the timestamp metadata on the SSTable.
>> 
>> It would be a lot more constructive to apply our brains towards solving an 
>> interesting problem than pointing out all its potential flaws based on gut 
>> feelings.  We haven't even moved this past an idea.  
>> 
>> I think it would solve a massive problem for a lot of people and is 100% 
>> worth considering.  Thanks Patrick and David for raising this.
>> 
>> Jon
>> 
>> 
>> 
>> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
>> <dev@cassandra.apache.org> wrote:
>> 
>> Ranged update sounds like a disaster for compaction and read performance.
>> Imagine compacting or reading some SSTables in which a large number of 
>> overlapping but non-identical ranges were updated with different values. It 
>> gives me a headache by just thinking about it.
>> Ranged delete is much simpler, because the "value" is the same tombstone 
>> marker, and it also is guaranteed to expire and disappear eventually, so the 
>> performance impact of dealing with them at read and compaction time doesn't 
>> suffer in the long term.
>> 
>> On 14/05/2024 16:59, Benjamin Lerer wrote:
>>> It should be like range tombstones ... in much worse ;-). A tombstone is a 
>>> simple marker (deleted). An update can be far more complex.  
>>> 
>>> Le mar. 14 mai 2024 à 15:52, Jon Haddad <j...@jonhaddad.com> a écrit :
>>> Is there a technical limitation that would prevent a range write that 
>>> functions the same way as a range tombstone, other than probably needing a 
>>> version bump of the storage format?
>>> 
>>> 
>>> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer <ble...@apache.org> wrote:
>>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They 
>>> do work on DELETE because under the hood C* they get translated into range 
>>> tombstones.
>>> 
>>> Le mar. 14 mai 2024 à 02:44, David Capwell <dcapw...@apple.com> a écrit :
>>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work.
>>> 
>>>> On May 13, 2024, at 7:40 AM, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>>> 
>>>> This is a great feature addition to CQL! I get asked about it from time to 
>>>> time but then people figure out a workaround. It will be great to just 
>>>> have it available. 
>>>> 
>>>> And right on Simon! I think the only project I had as a high school senior 
>>>> was figuring out how many parties I could go to and still maintain a 
>>>> passing grade. Thanks for your work here. 
>>>> 
>>>> Patrick 
>>>> 
>>>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer <ble...@apache.org> wrote:
>>>> Hi everybody,
>>>> 
>>>> Just raising awareness that Simon is working on adding support for the 
>>>> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604. 
>>>> We plan to add support for it in conditions in a separate patch.
>>>> 
>>>> The patch is available.
>>>> 
>>>> As a side note, Simon chose to do his highschool senior project 
>>>> contributing to Apache Cassandra. This patch is his first contribution for 
>>>> his senior project (his second feature contribution to Apache Cassandra).
>>>> 
>>>> 
>

Re: [DISCUSS] Adding support for BETWEEN operator

Reply via email to