So this thread stalled almost a year ago. (Wow, time flies when you're trying to release 4.0.) My synthesis of the conversation to this point is that while there are some open questions about testing methodology/"definition of done" and our choice of particular on-disk data structures, neither of these should be a serious obstacle to moving forward w/ a vote. Having said that, is there anything left around the CEP that we feel should prevent it from moving to a vote?
In terms of how we would proceed from the point a vote passes, it seems like there have been enough concerns around the proposed/necessary breaking changes to the 2i API, that we will start development by introducing components as incrementally as possible into a long-running feature branch off trunk. (This work would likely start w/ *CASSANDRA-16092* <https://issues.apache.org/jira/browse/CASSANDRA-16092>, which we could resolve as a sub-task of the SAI epic without interfering with other trunk development likely destined for a 4.x minor, etc.) On Thu, Sep 24, 2020 at 2:47 AM Jasonstack Zhao Yang < jasonstack.z...@gmail.com> wrote: > >> Question is: is this planned as a next step? > >> If yes, how are we going to mark SAI as experimental until it gets > >> row offsets? Also, it is likely that index format is going to change > when > >> row offsets are added, so my concern is that we may have to support two > >> versions of a format for a smooth migration. > > The goal is to support row-level index when merging SAI, I will update the > CEP about it. > > >> I think switching to row > >> offsets also has a huge impact on interaction with SPRC and has some > >> potential for optimisations. > > Can you share more details on the optimizations? > > > > On Thu, 24 Sep 2020 at 15:20, Oleksandr Petrov <oleksandr.pet...@gmail.com > > > wrote: > > > > But for improving overall index read performance, I think improving > base > > table read perf (because SAI/SASI executes LOTS of > > SinglePartitionReadCommand after searching on-disk index) is more > effective > > than switching from Trie to Prefix BTree. > > > > I haven't suggested switching to Prefix B-Tree or any other structure, > the > > question was about rationale and motivation of picking one over the > other, > > which I am curious about for personal reasons/interests that lie outside > of > > Cassandra. Having this listed in CEP could have been helpful for future > > guidance. It's ok if this question is outside of the CEP scope. > > > > I also agree that there are many areas that require improvement around > the > > read/write path and 2i, many of which (even outside of base table format > or > > read perf) can yield positive performance results. > > > > > FWIW, I personally look forward to receiving that contribution when the > > time is right. > > > > I am very excited for this contribution, too, and it looks like very > solid > > work. > > > > I have one more question, about "Upon resolving partition keys, rows are > > loaded using Cassandra’s internal partition read command across SSTables > > and are post filtered". One of the criticisms of SASI and reasons for > > marking it as experimental was CASSANDRA-11990. I think switching to row > > offsets also has a huge impact on interaction with SPRC and has some > > potential for optimisations. Question is: is this planned as a next step? > > If yes, how are we going to mark SAI as experimental until it gets > > row offsets? Also, it is likely that index format is going to change when > > row offsets are added, so my concern is that we may have to support two > > versions of a format for a smooth migration. > > > > > > > > On Thu, Sep 24, 2020 at 6:53 AM Jasonstack Zhao Yang < > > jasonstack.z...@gmail.com> wrote: > > > > > >> I think CEP should be more upfront with "eventually replace > > > >> it" bit, since it raises the question about what the people who are > > > using > > > >> other index implementations can expect. > > > > > > Will update the CEP to emphasize: SAI will replace other indexes. > > > > > > >> Unfortunately, I do not have an > > > >> implementation sitting around for a direct comparison, but I can > > imagine > > > >> situations when B-Trees may perform better because of simpler > > > construction. > > > >> Maybe we should even consider prototyping a prefix B-Tree to have a > > more > > > >> fair comparison. > > > > > > As long as prefix BTree supports range/prefix aggregation (which is > used > > to > > > speed up > > > range/prefix query when matching entire subtree), we can plug it in and > > > compare. It won't > > > affect the CEP design which focuses on sharing data across indexes and > > > posting aggregation. > > > > > > But for improving overall index read performance, I think improving > base > > > table read perf > > > (because SAI/SASI executes LOTS of SinglePartitionReadCommand after > > > searching on-disk index) > > > is more effective than switching from Trie to Prefix BTree. > > > > > > > > > > > > On Thu, 24 Sep 2020 at 05:33, Benedict Elliott Smith < > > bened...@apache.org> > > > wrote: > > > > > > > FWIW, I personally look forward to receiving that contribution when > the > > > > time is right. > > > > > > > > On 23/09/2020, 18:45, "Josh McKenzie" <jmcken...@apache.org> wrote: > > > > > > > > talking about that would involve some bits of information > DataStax > > > > might > > > > not be ready to share? > > > > > > > > At the risk of derailing, I've been poking and prodding this week > > at > > > we > > > > contributors at DS getting our act together w/a draft CEP for > > > donating > > > > the > > > > trie-based indices to the ASF project. > > > > > > > > More to come; the intention is certainly to contribute that code. > > The > > > > lack > > > > of a destination to merge it into (i.e. no 5.0-dev branch) is > > > removing > > > > significant urgency from the process as well (not to open a 3rd > > > > Pandora's > > > > box), but there's certainly an interrelatedness to the > > conversations > > > > going > > > > on. > > > > > > > > --- > > > > Josh McKenzie > > > > > > > > > > > > Sent via Superhuman <https://sprh.mn/?vip=jmcken...@apache.org> > > > > > > > > > > > > On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe < > > > > calebrackli...@gmail.com> > > > > wrote: > > > > > > > > > As long as we can construct the on-disk indexes > > > efficiently/directly > > > > from > > > > > a Memtable-attached index on flush, there's room to try other > > data > > > > > structures. Most of the innovation in SAI is around the layout > of > > > > postings > > > > > (something we can expand on if people are interested) and > having > > a > > > > > natively row-oriented design that scales w/ multiple indexed > > > columns > > > > on > > > > > single SSTables. There are some broader implications of using > the > > > > trie that > > > > > reach outside SAI itself, but talking about that would involve > > some > > > > bits of > > > > > information DataStax might not be ready to share? > > > > > > > > > > On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < > > > jeremiah.jordan@ > > > > > gmail.com> wrote: > > > > > > > > > > Short question: looking forward, how are we going to maintain > > three > > > > 2i > > > > > implementations: SASI, SAI, and 2i? > > > > > > > > > > I think one of the goals stated in the CEP is for SAI to have > > > parity > > > > with > > > > > 2i such that it could eventually replace it. > > > > > > > > > > On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov < > > > > > > > > > > oleksandr.pet...@gmail.com> wrote: > > > > > > > > > > Short question: looking forward, how are we going to maintain > > three > > > > 2i > > > > > implementations: SASI, SAI, and 2i? > > > > > > > > > > Another thing I think this CEP is missing is rationale and > > > motivation > > > > > about why trie-based indexes were chosen over, say, B-Tree. We > > did > > > > have a > > > > > short discussion about this on Slack, but both arguments that > > I've > > > > heard > > > > > (space-saving and keeping a small subset of nodes in memory) > work > > > > only > > > > > > > > > > for > > > > > > > > > > the most primitive implementation of a B-Tree. Fully-occupied > > > prefix > > > > > > > > > > B-Tree > > > > > > > > > > can have similar properties. There's been a lot of research on > > > > B-Trees > > > > > > > > > > and > > > > > > > > > > optimisations in those. Unfortunately, I do not have an > > > > implementation > > > > > sitting around for a direct comparison, but I can imagine > > > situations > > > > when > > > > > B-Trees may perform better because of simpler > > > > > > > > > > construction. > > > > > > > > > > Maybe we should even consider prototyping a prefix B-Tree to > > have a > > > > more > > > > > fair comparison. > > > > > > > > > > Thank you, > > > > > -- Alex > > > > > > > > > > On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang < > > > > jasonstack.zhao@ > > > > > gmail.com> wrote: > > > > > > > > > > Thank you Patrick for hosting Cassandra Contributor Meeting for > > > CEP-7 > > > > > > > > > > SAI. > > > > > > > > > > The recorded video is available here: > > > > > > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/ > > > > > 2020-09-01+Apache+Cassandra+Contributor+Meeting > > > > > > > > > > On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang < > > > > jasonstack.zhao@gmail. > > > > > com> > > > > > wrote: > > > > > > > > > > Thank you, Charles and Patrick > > > > > > > > > > On Tue, 1 Sep 2020 at 04:56, Charles Cao <caohair...@gmail.com > > > > > > wrote: > > > > > > > > > > Thank you, Patrick! > > > > > > > > > > On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin < > > > pmcfa...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > I just moved it to 8AM for this meeting to better accommodate > > APAC. > > > > > > > > > > Please > > > > > > > > > > see the update here: > > > > > > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/ > > > > > 2020-08-01+Apache+Cassandra+Contributor+Meeting > > > > > > > > > > Patrick > > > > > > > > > > On Mon, Aug 31, 2020 at 10:04 AM Charles Cao < > > caohair...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > Patrick, > > > > > > > > > > 11AM PST is a bad time for the people in the APAC timezone. Can > > we > > > > move it > > > > > to 7 or 8AM PST in the morning to accommodate their needs ? > > > > > > > > > > ~Charles > > > > > > > > > > On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin < > > > pmcfa...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > Meeting scheduled. > > > > > > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/ > > > > > 2020-08-01+Apache+Cassandra+Contributor+Meeting > > > > > > > > > > Tuesday September 1st, 11AM PST. I added a basic bullet for the > > > > > > > > > > agenda > > > > > > > > > > but > > > > > > > > > > if there is more, edit away. > > > > > > > > > > Patrick > > > > > > > > > > On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang < > > > > jasonstack.zhao@ > > > > > gmail.com> wrote: > > > > > > > > > > +1 > > > > > > > > > > On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova < > > > > > > > > > > e.dimitr...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > +1 > > > > > > > > > > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe < > > > > > > > > > > calebrackli...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > +1 > > > > > > > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin < > > > > > > > > > > pmcfa...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > This is related to the discussion Jordan and I had about > > > > > > > > > > the > > > > > > > > > > contributor > > > > > > > > > > Zoom call. Instead of open mic for any issue, call it > > > > > > > > > > based > > > > > > > > > > on a > > > > > > > > > > discussion > > > > > > > > > > thread or threads for higher bandwidth discussion. > > > > > > > > > > I would be happy to schedule on for next week to > > > > > > > > > > specifically > > > > > > > > > > discuss > > > > > > > > > > CEP-7. I can attach the recorded call to the CEP after. > > > > > > > > > > +1 or -1? > > > > > > > > > > Patrick > > > > > > > > > > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie < > > > > > > > > > > jmcken...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > Does community plan to open another discussion or CEP > > > > > > > > > > on > > > > > > > > > > modularization? > > > > > > > > > > We probably should have a discussion on the ML or > > > > > > > > > > monthly > > > > > > > > > > contrib > > > > > > > > > > call > > > > > > > > > > about it first to see how aligned the interested > > > > > > > > > > contributors > > > > > > > > > > are. > > > > > > > > > > Could > > > > > > > > > > do > > > > > > > > > > that through CEP as well but CEP's (at least thus far > > > > > > > > > > sans k8s > > > > > > > > > > operator) > > > > > > > > > > tend to start with a strong, deeply thought out point of > > > > > > > > > > view > > > > > > > > > > being > > > > > > > > > > expressed. > > > > > > > > > > On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang < > > > > > > > > > > jasonstack.z...@gmail.com> wrote: > > > > > > > > > > SASI's performance, specifically the search in the > > > > > > > > > > B+ > > > > > > > > > > tree > > > > > > > > > > component, > > > > > > > > > > depends a lot on the component file's header being > > > > > > > > > > available > > > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > pagecache. SASI benefits from (needs) nodes with > > > > > > > > > > lots of > > > > > > > > > > RAM. > > > > > > > > > > Is > > > > > > > > > > SAI > > > > > > > > > > bound > > > > > > > > > > to this same or similar limitation? > > > > > > > > > > SAI also benefits from larger memory because SAI puts > > > > > > > > > > block > > > > > > > > > > info > > > > > > > > > > on > > > > > > > > > > heap > > > > > > > > > > for searching on-disk components and having > > > > > > > > > > cross-index > > > > > > > > > > files on > > > > > > > > > > page > > > > > > > > > > cache > > > > > > > > > > improves read performance of different indexes on the > > > > > > > > > > same > > > > > > > > > > table. > > > > > > > > > > Flushing of SASI can be CPU+IO intensive, to the > > > > > > > > > > point of > > > > > > > > > > saturation, > > > > > > > > > > pauses, and crashes on the node. SSDs are a must, > > > > > > > > > > along > > > > > > > > > > with > > > > > > > > > > a > > > > > > > > > > bit > > > > > > > > > > of > > > > > > > > > > tuning, just to avoid bringing down your cluster. > > > > > > > > > > Beyond > > > > > > > > > > reducing > > > > > > > > > > space > > > > > > > > > > requirements, does SAI improve on these things? > > > > > > > > > > Like > > > > > > > > > > SASI how > > > > > > > > > > does > > > > > > > > > > SAI, > > > > > > > > > > in > > > > > > > > > > its own way, change/narrow the recommendations on > > > > > > > > > > node > > > > > > > > > > hardware > > > > > > > > > > specs? > > > > > > > > > > SAI won't crash the node during compaction and > > > > > > > > > > requires > > > > > > > > > > less > > > > > > > > > > CPU/IO. > > > > > > > > > > * SAI defines global memory limit for compaction > > > > > > > > > > instead of > > > > > > > > > > per-index > > > > > > > > > > memory limit used by SASI. > > > > > > > > > > For example, compactions are running on 10 tables > > > > > > > > > > and > > > > > > > > > > each > > > > > > > > > > has > > > > > > > > > > 10 > > > > > > > > > > indexes. SAI will cap the > > > > > > > > > > memory usage with global limit while SASI may use up > > > > > > > > > > to > > > > > > > > > > 100 * > > > > > > > > > > per-index > > > > > > > > > > limit. > > > > > > > > > > * After flushing in-memory segments to disk, SAI won't > > > > > > > > > > merge > > > > > > > > > > on-disk > > > > > > > > > > segments while SASI > > > > > > > > > > attempts to merge them at the end. > > > > > > > > > > There are pros and cons of not merging segments: > > > > > > > > > > ** Pros: compaction runs faster and requires fewer > > > > > > > > > > resources. > > > > > > > > > > ** Cons: small segments reduce compression ratio. > > > > > > > > > > * SAI on-disk format with row ids compresses better. > > > > > > > > > > I understand the desire in keeping out of scope > > > > > > > > > > the > > > > > > > > > > longer > > > > > > > > > > term > > > > > > > > > > deprecation > > > > > > > > > > and migration plan, but… if SASI provides > > > > > > > > > > functionality > > > > > > > > > > that > > > > > > > > > > SAI > > > > > > > > > > doesn't, > > > > > > > > > > like tokenisation and DelimiterAnalyzer, yet > > > > > > > > > > introduces a > > > > > > > > > > body > > > > > > > > > > of > > > > > > > > > > code > > > > > > > > > > ~somewhat similar, shouldn't we be roughly > > > > > > > > > > sketching out > > > > > > > > > > how > > > > > > > > > > to > > > > > > > > > > reduce > > > > > > > > > > the > > > > > > > > > > maintenance surface area? > > > > > > > > > > Agreed that we should reduce maintenance area if > > > > > > > > > > possible, > > > > > > > > > > but > > > > > > > > > > only > > > > > > > > > > very > > > > > > > > > > limited > > > > > > > > > > code base (eg. RangeIterator, QueryPlan) can be > > > > > > > > > > shared. > > > > > > > > > > The > > > > > > > > > > rest > > > > > > > > > > of > > > > > > > > > > the > > > > > > > > > > code base > > > > > > > > > > is quite different because of on-disk format and > > > > > > > > > > cross-index > > > > > > > > > > files. > > > > > > > > > > The goal of this CEP is to get community buy-in on > > > > > > > > > > SAI's > > > > > > > > > > design. > > > > > > > > > > Tokenization, > > > > > > > > > > DelimiterAnalyzer should be straightforward to > > > > > > > > > > implement on > > > > > > > > > > top > > > > > > > > > > of > > > > > > > > > > SAI. > > > > > > > > > > Can we list what configurations of SASI will > > > > > > > > > > become > > > > > > > > > > deprecated > > > > > > > > > > once > > > > > > > > > > SAI > > > > > > > > > > becomes non-experimental? > > > > > > > > > > Except for "Like", "Tokenisation", > > > > > > > > > > "DelimiterAnalyzer", > > > > > > > > > > the > > > > > > > > > > rest > > > > > > > > > > of > > > > > > > > > > SASI > > > > > > > > > > can > > > > > > > > > > be replaced by SAI. > > > > > > > > > > Given a few bugs are open against 2i and SASI, can > > > > > > > > > > we > > > > > > > > > > provide > > > > > > > > > > some > > > > > > > > > > overview, or rough indication, of how many of them > > > > > > > > > > we > > > > > > > > > > could > > > > > > > > > > "triage > > > > > > > > > > away"? > > > > > > > > > > I believe most of the known bugs in 2i/SASI either > > > > > > > > > > have > > > > > > > > > > been > > > > > > > > > > addressed > > > > > > > > > > in > > > > > > > > > > SAI or > > > > > > > > > > don't apply to SAI. > > > > > > > > > > And, is it time for the project to start > > > > > > > > > > introducing new > > > > > > > > > > SPI > > > > > > > > > > implementations as separate sub-modules and jar > > > > > > > > > > files > > > > > > > > > > that > > > > > > > > > > are > > > > > > > > > > only > > > > > > > > > > loaded > > > > > > > > > > at runtime based on configuration settings? (sorry > > > > > > > > > > for > > > > > > > > > > the > > > > > > > > > > conflation > > > > > > > > > > on > > > > > > > > > > this one, but maybe it's the right time to raise > > > > > > > > > > it > > > > > > > > > > :shrug:) > > > > > > > > > > Agreed that modularization is the way to go and will > > > > > > > > > > speed up > > > > > > > > > > module > > > > > > > > > > development speed. > > > > > > > > > > Does community plan to open another discussion or CEP > > > > > > > > > > on > > > > > > > > > > modularization? > > > > > > > > > > On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever < > > > > > > > > > > m...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > Adding to Duy's questions… > > > > > > > > > > * Hardware specs > > > > > > > > > > SASI's performance, specifically the search in the > > > > > > > > > > B+ > > > > > > > > > > tree > > > > > > > > > > component, > > > > > > > > > > depends a lot on the component file's header being > > > > > > > > > > available in > > > > > > > > > > the > > > > > > > > > > pagecache. SASI benefits from (needs) nodes with > > > > > > > > > > lots > > > > > > > > > > of > > > > > > > > > > RAM. > > > > > > > > > > Is > > > > > > > > > > SAI > > > > > > > > > > bound > > > > > > > > > > to this same or similar limitation? > > > > > > > > > > Flushing of SASI can be CPU+IO intensive, to the > > > > > > > > > > point of > > > > > > > > > > saturation, > > > > > > > > > > pauses, and crashes on the node. SSDs are a must, > > > > > > > > > > along > > > > > > > > > > with a > > > > > > > > > > bit > > > > > > > > > > of > > > > > > > > > > tuning, just to avoid bringing down your cluster. > > > > > > > > > > Beyond > > > > > > > > > > reducing > > > > > > > > > > space > > > > > > > > > > requirements, does SAI improve on these things? Like > > > > > > > > > > SASI > > > > > > > > > > how > > > > > > > > > > does > > > > > > > > > > SAI, > > > > > > > > > > in > > > > > > > > > > its own way, change/narrow the recommendations on > > > > > > > > > > node > > > > > > > > > > hardware > > > > > > > > > > specs? > > > > > > > > > > * Code Maintenance > > > > > > > > > > I understand the desire in keeping out of scope the > > > > > > > > > > longer > > > > > > > > > > term > > > > > > > > > > deprecation > > > > > > > > > > and migration plan, but… if SASI provides > > > > > > > > > > functionality > > > > > > > > > > that > > > > > > > > > > SAI > > > > > > > > > > doesn't, > > > > > > > > > > like tokenisation and DelimiterAnalyzer, yet > > > > > > > > > > introduces a > > > > > > > > > > body > > > > > > > > > > of > > > > > > > > > > code > > > > > > > > > > ~somewhat similar, shouldn't we be roughly sketching > > > > > > > > > > out > > > > > > > > > > how to > > > > > > > > > > reduce > > > > > > > > > > the > > > > > > > > > > maintenance surface area? > > > > > > > > > > Can we list what configurations of SASI will become > > > > > > > > > > deprecated > > > > > > > > > > once > > > > > > > > > > SAI > > > > > > > > > > becomes non-experimental? > > > > > > > > > > Given a few bugs are open against 2i and SASI, can > > > > > > > > > > we > > > > > > > > > > provide > > > > > > > > > > some > > > > > > > > > > overview, or rough indication, of how many of them > > > > > > > > > > we > > > > > > > > > > could > > > > > > > > > > "triage > > > > > > > > > > away"? > > > > > > > > > > And, is it time for the project to start introducing > > > > > > > > > > new > > > > > > > > > > SPI > > > > > > > > > > implementations as separate sub-modules and jar > > > > > > > > > > files > > > > > > > > > > that > > > > > > > > > > are > > > > > > > > > > only > > > > > > > > > > loaded > > > > > > > > > > at runtime based on configuration settings? (sorry > > > > > > > > > > for the > > > > > > > > > > conflation > > > > > > > > > > on > > > > > > > > > > this one, but maybe it's the right time to raise it > > > > > > > > > > :shrug:) > > > > > > > > > > regards, > > > > > > > > > > Mick > > > > > > > > > > On Tue, 18 Aug 2020 at 13:05, DuyHai Doan < > > > > > > > > > > doanduy...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > Thank you Zhao Yang for starting this topic > > > > > > > > > > After reading the short design doc, I have a few > > > > > > > > > > questions > > > > > > > > > > 1) SASI was pretty inefficient indexing wide > > > > > > > > > > partitions > > > > > > > > > > because > > > > > > > > > > the > > > > > > > > > > index > > > > > > > > > > structure only retains the partition token, not > > > > > > > > > > the > > > > > > > > > > clustering > > > > > > > > > > colums. > > > > > > > > > > As > > > > > > > > > > per design doc SAI has row id mapping to partition > > > > > > > > > > offset, > > > > > > > > > > can > > > > > > > > > > we > > > > > > > > > > hope > > > > > > > > > > that > > > > > > > > > > indexing wide partition will be more efficient > > > > > > > > > > with > > > > > > > > > > SAI > > > > > > > > > > ? One > > > > > > > > > > detail > > > > > > > > > > that > > > > > > > > > > worries me is that in the beggining of the design > > > > > > > > > > doc, > > > > > > > > > > it is > > > > > > > > > > said > > > > > > > > > > that > > > > > > > > > > the > > > > > > > > > > matching rows are post filtered while scanning the > > > > > > > > > > partition. > > > > > > > > > > Can > > > > > > > > > > you > > > > > > > > > > confirm or infirm that SAI is efficient with wide > > > > > > > > > > partitions > > > > > > > > > > and > > > > > > > > > > provides > > > > > > > > > > the partition offsets to the matching rows ? > > > > > > > > > > 2) About space efficiency, one of the biggest > > > > > > > > > > drawback of > > > > > > > > > > SASI > > > > > > > > > > was > > > > > > > > > > the > > > > > > > > > > huge > > > > > > > > > > space required for index structure when using > > > > > > > > > > CONTAINS > > > > > > > > > > logic > > > > > > > > > > because > > > > > > > > > > of > > > > > > > > > > the > > > > > > > > > > decomposition of text columns into n-grams. Will > > > > > > > > > > SAI > > > > > > > > > > suffer > > > > > > > > > > from > > > > > > > > > > the > > > > > > > > > > same > > > > > > > > > > issue in future iterations ? I'm anticipating a > > > > > > > > > > bit > > > > > > > > > > 3) If I'm querying using SAI and providing > > > > > > > > > > complete > > > > > > > > > > partition > > > > > > > > > > key, > > > > > > > > > > will > > > > > > > > > > it > > > > > > > > > > be more efficient than querying without partition > > > > > > > > > > key. In > > > > > > > > > > other > > > > > > > > > > words, > > > > > > > > > > does > > > > > > > > > > SAI provide any optimisation when partition key is > > > > > > > > > > specified > > > > > > > > > > ? > > > > > > > > > > Regards > > > > > > > > > > Duy Hai DOAN > > > > > > > > > > Le mar. 18 août 2020 à 11:39, Mick Semb Wever < > > > > > > > > > > m...@apache.org> > > > > > > > > > > a > > > > > > > > > > écrit : > > > > > > > > > > We are looking forward to the community's > > > > > > > > > > feedback > > > > > > > > > > and > > > > > > > > > > suggestions. > > > > > > > > > > What comes immediately to mind is testing > > > > > > > > > > requirements. It > > > > > > > > > > has > > > > > > > > > > been > > > > > > > > > > mentioned already that the project's testability > > > > > > > > > > and QA > > > > > > > > > > guidelines > > > > > > > > > > are > > > > > > > > > > inadequate to successfully introduce new > > > > > > > > > > features > > > > > > > > > > and > > > > > > > > > > refactorings > > > > > > > > > > to > > > > > > > > > > the > > > > > > > > > > codebase. During the 4.0 beta phase this was > > > > > > > > > > intended > > > > > > > > > > to be > > > > > > > > > > addressed, > > > > > > > > > > i.e. > > > > > > > > > > defining more specific QA guidelines for 4.0-rc. > > > > > > > > > > This > > > > > > > > > > would > > > > > > > > > > be > > > > > > > > > > an > > > > > > > > > > important > > > > > > > > > > step towards QA guidelines for all changes and > > > > > > > > > > CEPs > > > > > > > > > > post-4.0. > > > > > > > > > > Questions from me > > > > > > > > > > - How will this be tested, how will its QA > > > > > > > > > > status and > > > > > > > > > > lifecycle > > > > > > > > > > be > > > > > > > > > > defined? (per above) > > > > > > > > > > - With existing C* code needing to be changed, > > > > > > > > > > what > > > > > > > > > > is the > > > > > > > > > > proposed > > > > > > > > > > plan > > > > > > > > > > for making those changes ensuring maintained QA, > > > > > > > > > > e.g. > > > > > > > > > > is > > > > > > > > > > there > > > > > > > > > > separate > > > > > > > > > > QA > > > > > > > > > > cycles planned for altering the SPI before > > > > > > > > > > adding > > > > > > > > > > a > > > > > > > > > > new SPI > > > > > > > > > > implementation? > > > > > > > > > > - Despite being out of scope, it would be nice > > > > > > > > > > to have > > > > > > > > > > some > > > > > > > > > > idea > > > > > > > > > > from > > > > > > > > > > the > > > > > > > > > > CEP author of when users might still choose > > > > > > > > > > afresh 2i > > > > > > > > > > or > > > > > > > > > > SASI > > > > > > > > > > over > > > > > > > > > > SAI, > > > > > > > > > > - Who fills the roles involved? Who are the > > > > > > > > > > contributors > > > > > > > > > > in > > > > > > > > > > this > > > > > > > > > > DataStax > > > > > > > > > > team? Who is the shepherd? Are there other > > > > > > > > > > stakeholders > > > > > > > > > > willing > > > > > > > > > > to > > > > > > > > > > be > > > > > > > > > > involved? > > > > > > > > > > - Is there a preference to use gdoc instead of > > > > > > > > > > the > > > > > > > > > > project's > > > > > > > > > > wiki, > > > > > > > > > > and > > > > > > > > > > why? (the CEP process suggest a wiki page, and > > > > > > > > > > feedback on > > > > > > > > > > why > > > > > > > > > > another > > > > > > > > > > approach is considered better helps evolve the > > > > > > > > > > CEP > > > > > > > > > > process > > > > > > > > > > itself) > > > > > > > > > > cheers, > > > > > > > > > > Mick > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For > > > > > additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To > > > > > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For > > > > additional > > > > > commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > -- > > > > > alex p > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To > > > > > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For > > > > additional > > > > > commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > > > > -- > > alex p > > >