Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread guo Maxwell
+1 , as we must Improve the image of your own default indexing ability. and As for *CREATE CUSTOM INDEX *, should we just left as it is and we can disable the ability for create SAI through *CREATE CUSTOM INDEX* in some version after 5.0? for as I know there may be users using this as a

Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Dinesh Joshi
I agree. 5.0 is a major release and provides an opportunity to switch defaults. > On May 9, 2023, at 7:00 PM, Jonathan Ellis wrote: > > +1 for this, especially in the long term. CREATE INDEX should do the right > thing for most people without requiring extra ceremony. > > On Tue, May 9, 2023

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Dinesh Joshi
+1 > On May 8, 2023, at 1:52 AM, Piotr Kołaczkowski wrote: > > Let's vote. > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator > > Piotr Kołaczkowski > e. pkola...@datastax.com > w. www.datastax.com

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Berenguer Blasi
+1 On 10/5/23 3:57, Jonathan Ellis wrote: +1 On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski wrote: Let's vote. https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator Piotr Kołaczkowski e. pkola...@datastax.com w. www.datastax.com

Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Jonathan Ellis
+1 for this, especially in the long term. CREATE INDEX should do the right thing for most people without requiring extra ceremony. On Tue, May 9, 2023 at 5:20 PM Jeremiah D Jordan wrote: > If the consensus is that SAI is the right default index, then we should > just change CREATE INDEX to be

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Jonathan Ellis
+1 On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski wrote: > Let's vote. > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator > > Piotr Kołaczkowski > e. pkola...@datastax.com > w. www.datastax.com > -- Jonathan Ellis co-founder, http://www.datastax.com

Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Jeremiah D Jordan
> If we assume SAI is what we should use by default for the cluster, would it > make sense to allow > > CREATE INDEX [IF NOT EXISTS] [name] ON () > > But use a new yaml config that switches from legacy to SAI? > > default_2i_impl: sai > > For 5.0 we can default to “legacy” (new features

Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Jeremiah D Jordan
If the consensus is that SAI is the right default index, then we should just change CREATE INDEX to be SAI, and legacy 2i to be a CUSTOM INDEX. > On May 9, 2023, at 4:44 PM, Caleb Rackliffe wrote: > > Earlier today, Mick started a thread on the future of our index creation DDL > on Slack: >

Re: [DISCUSS] The future of CREATE INDEX

2023-05-09 Thread David Capwell
If we assume SAI is what we should use by default for the cluster, would it make sense to allow CREATE INDEX [IF NOT EXISTS] [name] ON () But use a new yaml config that switches from legacy to SAI? default_2i_impl: sai For 5.0 we can default to “legacy” (new features disabled by default),

[DISCUSS] The future of CREATE INDEX

2023-05-09 Thread Caleb Rackliffe
Earlier today, Mick started a thread on the future of our index creation DDL on Slack: https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019 At the moment, there are two ways to create a secondary index. *1.) CREATE INDEX [IF NOT EXISTS] [name] ON ()* This creates an optionally

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Jeremy Hanna
Just wanted to add that I don't have any special knowledge of CEP-30 beyond what Jonathan posted and just trying to help clarify and answer questions as I can with some knowledge and experience from DSE Search and SAI. Thanks to Caleb for helping validate some things as well. And to be clear

Call for Presentations, Community Over Code 2023

2023-05-09 Thread Rich Bowen
(Note: You are receiving this because you are subscribed to the dev@ list for one or more Apache Software Foundation projects.) The Call for Presentations (CFP) for Community Over Code (formerly Apachecon) 2023 is open at https://communityovercode.org/call-for-presentations/, and will close Thu,

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Jeremy Hanna
I talked to David and some others in slack to hopefully clarify: With SAI, can you have partial results? When you have a query that is non-key based, you need to have full token range coverage of the results. If that isn't possible, will Vector Search/SAI return partial results? Anything can

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Josh McKenzie
+1 On Tue, May 9, 2023, at 2:42 PM, Patrick McFadin wrote: > +1 > > On Tue, May 9, 2023 at 10:58 AM Caleb Rackliffe > wrote: >> +1 >> >> On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski >> wrote: >>> Let's vote. >>> >>>

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Caleb Rackliffe
Anyone on this ML who still remembers DSE Search (or has experience w/ Elastic or SolrCloud) probably also knows that there are some significant pieces of an optimized scatter/gather apparatus for IR (even without sorting, which also doesn't exist yet) that do not exist in C* or it's range query

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Benedict
HNSW can in principle be made into a distributed index. But that would be quite a different paradigm to SAI.On 9 May 2023, at 19:30, Patrick McFadin wrote:Under the goals section, there is this line:Scatter/gather across replicas, combining topK from each to get global topK.But what I'm hearing

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Patrick McFadin
+1 On Tue, May 9, 2023 at 10:58 AM Caleb Rackliffe wrote: > +1 > > On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski > wrote: > >> Let's vote. >> >> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator >> >> Piotr Kołaczkowski >> e. pkola...@datastax.com >> w.

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Patrick McFadin
Under the goals section, there is this line: 1. Scatter/gather across replicas, combining topK from each to get global topK. But what I'm hearing is, exactly how will that happen? Maybe this is an SAI question too. How is that verified in SAI? On Tue, May 9, 2023 at 11:07 AM David

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread David Capwell
Approach section doesn’t go over how this will handle cross replica search, this would be good to flesh out… given results have a real ranking, the current 2i logic may yield incorrect results… so would think we need num_ranges / rf queries in the best case, with some new capability to sort the

Re: [VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Caleb Rackliffe
+1 On Tue, May 9, 2023 at 12:04 PM Piotr Kołaczkowski wrote: > Let's vote. > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator > > Piotr Kołaczkowski > e. pkola...@datastax.com > w. www.datastax.com >

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-09 Thread shailajakoppu
+1 > On 5 May 2023, at 11:44 am, Sam Tunnicliffe wrote: > > +1 > >> On 4 May 2023, at 17:46, Doug Rohrer wrote: >> >> Hello all, >> >> I’d like to put CEP-28 to a vote. >> >> Proposal: >> >>

Cassandra Contributor Meeting May 30

2023-05-09 Thread Melissa Logan
Hi folks, The Cassandra community will be hosting monthly Contributor Meetings the last Tuesday of each month at 10:00 PT / 13:00 ET / 17:00 UTC / 22:30 IST. The purpose of these meetings is to enable real-time collaboration for contributors to discuss CEPs and other issues, and ask questions.

CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Jonathan Ellis
Hi all, Following the recent discussion threads, I would like to propose CEP-30 to add Approximate Nearest Neighbor (ANN) Vector Search via Storage-Attached Indexes (SAI) to Apache Cassandra. The primary goal of this proposal is to implement ANN vector search capabilities, making Cassandra more

[VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Piotr Kołaczkowski
Let's vote. https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator Piotr Kołaczkowski e. pkola...@datastax.com w. www.datastax.com

Re: [DISCUSS] CEP-29 CQL NOT Operator

2023-05-09 Thread Piotr Kołaczkowski
Ok, overall I think the discussion has settled and the feature is non-controversial, except the approach to ALLOW FILTERING. I added a note to non goals saying that we don't want to change the approach to ALLOW FILTERING here - and this proposal is to stay consistent with the current approach. We

[VOTE] CEP-29 CQL NOT Operator

2023-05-09 Thread Piotr Kołaczkowski
Hello, I'd like to start a vote on adding the NOT operator to CQL. CEP doc: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-29%3A+CQL+NOT+operator Thanks, Piotr Piotr Kołaczkowski e. pkola...@datastax.com w. www.datastax.com

[VOTE] Release Apache Cassandra 3.0.29

2023-05-09 Thread Miklosovic, Stefan
Proposing the test build of Cassandra 3.0.29 for release. sha1: 087cffce636b63c12e328994d52bdf8f4ccc9750 Git: https://github.com/apache/cassandra/tree/3.0.29-tentative Maven Artifacts:

RE: Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-09 Thread Saranya Krishnakumar
+1[nb] Best, Saranya On 2023/05/04 23:39:18 Francisco Guerrero wrote: > +1 (nb) > > On 2023/05/04 23:38:08 Yifan Cai wrote: > > +1 > > > > From: Jon Haddad > > Sent: Thursday, May 4, 2023 3:31:52 PM > > To: dev@cassandra.apache.org > > Subject: Re: [VOTE]