Re: [DISCUSS] CEP-7 Storage Attached Index

Jasonstack Zhao Yang Wed, 23 Sep 2020 21:53:39 -0700

>> I think CEP should be more upfront with "eventually replace
>>  it" bit, since it raises the question about what the people who are
using
>> other index implementations can expect.


Will update the CEP to emphasize: SAI will replace other indexes.

>> Unfortunately, I do not have an
>> implementation sitting around for a direct comparison, but I can imagine
>> situations when B-Trees may perform better because of simpler
construction.
>> Maybe we should even consider prototyping a prefix B-Tree to have a more
>> fair comparison.

As long as prefix BTree supports range/prefix aggregation (which is used to
speed up
range/prefix query when matching entire subtree), we can plug it in and
compare. It won't
affect the CEP design which focuses on sharing data across indexes and
posting aggregation.

But for improving overall index read performance, I think improving base
table read perf
 (because SAI/SASI executes LOTS of SinglePartitionReadCommand after
searching on-disk index)
is more effective than switching from Trie to Prefix BTree.



On Thu, 24 Sep 2020 at 05:33, Benedict Elliott Smith <bened...@apache.org>
wrote:

> FWIW, I personally look forward to receiving that contribution when the
> time is right.
>
> On 23/09/2020, 18:45, "Josh McKenzie" <jmcken...@apache.org> wrote:
>
>     talking about that would involve some bits of information DataStax
> might
>     not be ready to share?
>
>     At the risk of derailing, I've been poking and prodding this week at we
>     contributors at DS getting our act together w/a draft CEP for donating
> the
>     trie-based indices to the ASF project.
>
>     More to come; the intention is certainly to contribute that code. The
> lack
>     of a destination to merge it into (i.e. no 5.0-dev branch) is removing
>     significant urgency from the process as well (not to open a 3rd
> Pandora's
>     box), but there's certainly an interrelatedness to the conversations
> going
>     on.
>
>     ---
>     Josh McKenzie
>
>
>     Sent via Superhuman <https://sprh.mn/?vip=jmcken...@apache.org>
>
>
>     On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe <
> calebrackli...@gmail.com>
>     wrote:
>
>     > As long as we can construct the on-disk indexes efficiently/directly
> from
>     > a Memtable-attached index on flush, there's room to try other data
>     > structures. Most of the innovation in SAI is around the layout of
> postings
>     > (something we can expand on if people are interested) and having a
>     > natively row-oriented design that scales w/ multiple indexed columns
> on
>     > single SSTables. There are some broader implications of using the
> trie that
>     > reach outside SAI itself, but talking about that would involve some
> bits of
>     > information DataStax might not be ready to share?
>     >
>     > On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < jeremiah.jordan@
>     > gmail.com> wrote:
>     >
>     > Short question: looking forward, how are we going to maintain three
> 2i
>     > implementations: SASI, SAI, and 2i?
>     >
>     > I think one of the goals stated in the CEP is for SAI to have parity
> with
>     > 2i such that it could eventually replace it.
>     >
>     > On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
>     >
>     > oleksandr.pet...@gmail.com> wrote:
>     >
>     > Short question: looking forward, how are we going to maintain three
> 2i
>     > implementations: SASI, SAI, and 2i?
>     >
>     > Another thing I think this CEP is missing is rationale and motivation
>     > about why trie-based indexes were chosen over, say, B-Tree. We did
> have a
>     > short discussion about this on Slack, but both arguments that I've
> heard
>     > (space-saving and keeping a small subset of nodes in memory) work
> only
>     >
>     > for
>     >
>     > the most primitive implementation of a B-Tree. Fully-occupied prefix
>     >
>     > B-Tree
>     >
>     > can have similar properties. There's been a lot of research on
> B-Trees
>     >
>     > and
>     >
>     > optimisations in those. Unfortunately, I do not have an
> implementation
>     > sitting around for a direct comparison, but I can imagine situations
> when
>     > B-Trees may perform better because of simpler
>     >
>     > construction.
>     >
>     > Maybe we should even consider prototyping a prefix B-Tree to have a
> more
>     > fair comparison.
>     >
>     > Thank you,
>     > -- Alex
>     >
>     > On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang <
> jasonstack.zhao@
>     > gmail.com> wrote:
>     >
>     > Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7
>     >
>     > SAI.
>     >
>     > The recorded video is available here:
>     >
>     > https://cwiki.apache.org/confluence/display/CASSANDRA/
>     > 2020-09-01+Apache+Cassandra+Contributor+Meeting
>     >
>     > On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang <
> jasonstack.zhao@gmail.
>     > com>
>     > wrote:
>     >
>     > Thank you, Charles and Patrick
>     >
>     > On Tue, 1 Sep 2020 at 04:56, Charles Cao <caohair...@gmail.com>
> wrote:
>     >
>     > Thank you, Patrick!
>     >
>     > On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin <pmcfa...@gmail.com
> >
>     > wrote:
>     >
>     > I just moved it to 8AM for this meeting to better accommodate APAC.
>     >
>     > Please
>     >
>     > see the update here:
>     >
>     > https://cwiki.apache.org/confluence/display/CASSANDRA/
>     > 2020-08-01+Apache+Cassandra+Contributor+Meeting
>     >
>     > Patrick
>     >
>     > On Mon, Aug 31, 2020 at 10:04 AM Charles Cao <caohair...@gmail.com>
>     >
>     > wrote:
>     >
>     > Patrick,
>     >
>     > 11AM PST is a bad time for the people in the APAC timezone. Can we
> move it
>     > to 7 or 8AM PST in the morning to accommodate their needs ?
>     >
>     > ~Charles
>     >
>     > On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin <pmcfa...@gmail.com
>     >
>     > wrote:
>     >
>     > Meeting scheduled.
>     >
>     > https://cwiki.apache.org/confluence/display/CASSANDRA/
>     > 2020-08-01+Apache+Cassandra+Contributor+Meeting
>     >
>     > Tuesday September 1st, 11AM PST. I added a basic bullet for the
>     >
>     > agenda
>     >
>     > but
>     >
>     > if there is more, edit away.
>     >
>     > Patrick
>     >
>     > On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
> jasonstack.zhao@
>     > gmail.com> wrote:
>     >
>     > +1
>     >
>     > On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
>     >
>     > e.dimitr...@gmail.com>
>     >
>     > wrote:
>     >
>     > +1
>     >
>     > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
>     >
>     > calebrackli...@gmail.com>
>     >
>     > wrote:
>     >
>     > +1
>     >
>     > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
>     >
>     > pmcfa...@gmail.com>
>     >
>     > wrote:
>     >
>     > This is related to the discussion Jordan and I had about
>     >
>     > the
>     >
>     > contributor
>     >
>     > Zoom call. Instead of open mic for any issue, call it
>     >
>     > based
>     >
>     > on a
>     >
>     > discussion
>     >
>     > thread or threads for higher bandwidth discussion.
>     >
>     > I would be happy to schedule on for next week to
>     >
>     > specifically
>     >
>     > discuss
>     >
>     > CEP-7. I can attach the recorded call to the CEP after.
>     >
>     > +1 or -1?
>     >
>     > Patrick
>     >
>     > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie <
>     >
>     > jmcken...@apache.org>
>     >
>     > wrote:
>     >
>     > Does community plan to open another discussion or CEP
>     >
>     > on
>     >
>     > modularization?
>     >
>     > We probably should have a discussion on the ML or
>     >
>     > monthly
>     >
>     > contrib
>     >
>     > call
>     >
>     > about it first to see how aligned the interested
>     >
>     > contributors
>     >
>     > are.
>     >
>     > Could
>     >
>     > do
>     >
>     > that through CEP as well but CEP's (at least thus far
>     >
>     > sans k8s
>     >
>     > operator)
>     >
>     > tend to start with a strong, deeply thought out point of
>     >
>     > view
>     >
>     > being
>     >
>     > expressed.
>     >
>     > On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang <
>     >
>     > jasonstack.z...@gmail.com> wrote:
>     >
>     > SASI's performance, specifically the search in the
>     >
>     > B+
>     >
>     > tree
>     >
>     > component,
>     >
>     > depends a lot on the component file's header being
>     >
>     > available
>     >
>     > in
>     >
>     > the
>     >
>     > pagecache. SASI benefits from (needs) nodes with
>     >
>     > lots of
>     >
>     > RAM.
>     >
>     > Is
>     >
>     > SAI
>     >
>     > bound
>     >
>     > to this same or similar limitation?
>     >
>     > SAI also benefits from larger memory because SAI puts
>     >
>     > block
>     >
>     > info
>     >
>     > on
>     >
>     > heap
>     >
>     > for searching on-disk components and having
>     >
>     > cross-index
>     >
>     > files on
>     >
>     > page
>     >
>     > cache
>     >
>     > improves read performance of different indexes on the
>     >
>     > same
>     >
>     > table.
>     >
>     > Flushing of SASI can be CPU+IO intensive, to the
>     >
>     > point of
>     >
>     > saturation,
>     >
>     > pauses, and crashes on the node. SSDs are a must,
>     >
>     > along
>     >
>     > with
>     >
>     > a
>     >
>     > bit
>     >
>     > of
>     >
>     > tuning, just to avoid bringing down your cluster.
>     >
>     > Beyond
>     >
>     > reducing
>     >
>     > space
>     >
>     > requirements, does SAI improve on these things?
>     >
>     > Like
>     >
>     > SASI how
>     >
>     > does
>     >
>     > SAI,
>     >
>     > in
>     >
>     > its own way, change/narrow the recommendations on
>     >
>     > node
>     >
>     > hardware
>     >
>     > specs?
>     >
>     > SAI won't crash the node during compaction and
>     >
>     > requires
>     >
>     > less
>     >
>     > CPU/IO.
>     >
>     > * SAI defines global memory limit for compaction
>     >
>     > instead of
>     >
>     > per-index
>     >
>     > memory limit used by SASI.
>     >
>     > For example, compactions are running on 10 tables
>     >
>     > and
>     >
>     > each
>     >
>     > has
>     >
>     > 10
>     >
>     > indexes. SAI will cap the
>     >
>     > memory usage with global limit while SASI may use up
>     >
>     > to
>     >
>     > 100 *
>     >
>     > per-index
>     >
>     > limit.
>     >
>     > * After flushing in-memory segments to disk, SAI won't
>     >
>     > merge
>     >
>     > on-disk
>     >
>     > segments while SASI
>     >
>     > attempts to merge them at the end.
>     >
>     > There are pros and cons of not merging segments:
>     >
>     > ** Pros: compaction runs faster and requires fewer
>     >
>     > resources.
>     >
>     > ** Cons: small segments reduce compression ratio.
>     >
>     > * SAI on-disk format with row ids compresses better.
>     >
>     > I understand the desire in keeping out of scope
>     >
>     > the
>     >
>     > longer
>     >
>     > term
>     >
>     > deprecation
>     >
>     > and migration plan, but… if SASI provides
>     >
>     > functionality
>     >
>     > that
>     >
>     > SAI
>     >
>     > doesn't,
>     >
>     > like tokenisation and DelimiterAnalyzer, yet
>     >
>     > introduces a
>     >
>     > body
>     >
>     > of
>     >
>     > code
>     >
>     > ~somewhat similar, shouldn't we be roughly
>     >
>     > sketching out
>     >
>     > how
>     >
>     > to
>     >
>     > reduce
>     >
>     > the
>     >
>     > maintenance surface area?
>     >
>     > Agreed that we should reduce maintenance area if
>     >
>     > possible,
>     >
>     > but
>     >
>     > only
>     >
>     > very
>     >
>     > limited
>     >
>     > code base (eg. RangeIterator, QueryPlan) can be
>     >
>     > shared.
>     >
>     > The
>     >
>     > rest
>     >
>     > of
>     >
>     > the
>     >
>     > code base
>     >
>     > is quite different because of on-disk format and
>     >
>     > cross-index
>     >
>     > files.
>     >
>     > The goal of this CEP is to get community buy-in on
>     >
>     > SAI's
>     >
>     > design.
>     >
>     > Tokenization,
>     >
>     > DelimiterAnalyzer should be straightforward to
>     >
>     > implement on
>     >
>     > top
>     >
>     > of
>     >
>     > SAI.
>     >
>     > Can we list what configurations of SASI will
>     >
>     > become
>     >
>     > deprecated
>     >
>     > once
>     >
>     > SAI
>     >
>     > becomes non-experimental?
>     >
>     > Except for "Like", "Tokenisation",
>     >
>     > "DelimiterAnalyzer",
>     >
>     > the
>     >
>     > rest
>     >
>     > of
>     >
>     > SASI
>     >
>     > can
>     >
>     > be replaced by SAI.
>     >
>     > Given a few bugs are open against 2i and SASI, can
>     >
>     > we
>     >
>     > provide
>     >
>     > some
>     >
>     > overview, or rough indication, of how many of them
>     >
>     > we
>     >
>     > could
>     >
>     > "triage
>     >
>     > away"?
>     >
>     > I believe most of the known bugs in 2i/SASI either
>     >
>     > have
>     >
>     > been
>     >
>     > addressed
>     >
>     > in
>     >
>     > SAI or
>     >
>     > don't apply to SAI.
>     >
>     > And, is it time for the project to start
>     >
>     > introducing new
>     >
>     > SPI
>     >
>     > implementations as separate sub-modules and jar
>     >
>     > files
>     >
>     > that
>     >
>     > are
>     >
>     > only
>     >
>     > loaded
>     >
>     > at runtime based on configuration settings? (sorry
>     >
>     > for
>     >
>     > the
>     >
>     > conflation
>     >
>     > on
>     >
>     > this one, but maybe it's the right time to raise
>     >
>     > it
>     >
>     > :shrug:)
>     >
>     > Agreed that modularization is the way to go and will
>     >
>     > speed up
>     >
>     > module
>     >
>     > development speed.
>     >
>     > Does community plan to open another discussion or CEP
>     >
>     > on
>     >
>     > modularization?
>     >
>     > On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever <
>     >
>     > m...@apache.org>
>     >
>     > wrote:
>     >
>     > Adding to Duy's questions…
>     >
>     > * Hardware specs
>     >
>     > SASI's performance, specifically the search in the
>     >
>     > B+
>     >
>     > tree
>     >
>     > component,
>     >
>     > depends a lot on the component file's header being
>     >
>     > available in
>     >
>     > the
>     >
>     > pagecache. SASI benefits from (needs) nodes with
>     >
>     > lots
>     >
>     > of
>     >
>     > RAM.
>     >
>     > Is
>     >
>     > SAI
>     >
>     > bound
>     >
>     > to this same or similar limitation?
>     >
>     > Flushing of SASI can be CPU+IO intensive, to the
>     >
>     > point of
>     >
>     > saturation,
>     >
>     > pauses, and crashes on the node. SSDs are a must,
>     >
>     > along
>     >
>     > with a
>     >
>     > bit
>     >
>     > of
>     >
>     > tuning, just to avoid bringing down your cluster.
>     >
>     > Beyond
>     >
>     > reducing
>     >
>     > space
>     >
>     > requirements, does SAI improve on these things? Like
>     >
>     > SASI
>     >
>     > how
>     >
>     > does
>     >
>     > SAI,
>     >
>     > in
>     >
>     > its own way, change/narrow the recommendations on
>     >
>     > node
>     >
>     > hardware
>     >
>     > specs?
>     >
>     > * Code Maintenance
>     >
>     > I understand the desire in keeping out of scope the
>     >
>     > longer
>     >
>     > term
>     >
>     > deprecation
>     >
>     > and migration plan, but… if SASI provides
>     >
>     > functionality
>     >
>     > that
>     >
>     > SAI
>     >
>     > doesn't,
>     >
>     > like tokenisation and DelimiterAnalyzer, yet
>     >
>     > introduces a
>     >
>     > body
>     >
>     > of
>     >
>     > code
>     >
>     > ~somewhat similar, shouldn't we be roughly sketching
>     >
>     > out
>     >
>     > how to
>     >
>     > reduce
>     >
>     > the
>     >
>     > maintenance surface area?
>     >
>     > Can we list what configurations of SASI will become
>     >
>     > deprecated
>     >
>     > once
>     >
>     > SAI
>     >
>     > becomes non-experimental?
>     >
>     > Given a few bugs are open against 2i and SASI, can
>     >
>     > we
>     >
>     > provide
>     >
>     > some
>     >
>     > overview, or rough indication, of how many of them
>     >
>     > we
>     >
>     > could
>     >
>     > "triage
>     >
>     > away"?
>     >
>     > And, is it time for the project to start introducing
>     >
>     > new
>     >
>     > SPI
>     >
>     > implementations as separate sub-modules and jar
>     >
>     > files
>     >
>     > that
>     >
>     > are
>     >
>     > only
>     >
>     > loaded
>     >
>     > at runtime based on configuration settings? (sorry
>     >
>     > for the
>     >
>     > conflation
>     >
>     > on
>     >
>     > this one, but maybe it's the right time to raise it
>     >
>     > :shrug:)
>     >
>     > regards,
>     >
>     > Mick
>     >
>     > On Tue, 18 Aug 2020 at 13:05, DuyHai Doan <
>     >
>     > doanduy...@gmail.com>
>     >
>     > wrote:
>     >
>     > Thank you Zhao Yang for starting this topic
>     >
>     > After reading the short design doc, I have a few
>     >
>     > questions
>     >
>     > 1) SASI was pretty inefficient indexing wide
>     >
>     > partitions
>     >
>     > because
>     >
>     > the
>     >
>     > index
>     >
>     > structure only retains the partition token, not
>     >
>     > the
>     >
>     > clustering
>     >
>     > colums.
>     >
>     > As
>     >
>     > per design doc SAI has row id mapping to partition
>     >
>     > offset,
>     >
>     > can
>     >
>     > we
>     >
>     > hope
>     >
>     > that
>     >
>     > indexing wide partition will be more efficient
>     >
>     > with
>     >
>     > SAI
>     >
>     > ? One
>     >
>     > detail
>     >
>     > that
>     >
>     > worries me is that in the beggining of the design
>     >
>     > doc,
>     >
>     > it is
>     >
>     > said
>     >
>     > that
>     >
>     > the
>     >
>     > matching rows are post filtered while scanning the
>     >
>     > partition.
>     >
>     > Can
>     >
>     > you
>     >
>     > confirm or infirm that SAI is efficient with wide
>     >
>     > partitions
>     >
>     > and
>     >
>     > provides
>     >
>     > the partition offsets to the matching rows ?
>     >
>     > 2) About space efficiency, one of the biggest
>     >
>     > drawback of
>     >
>     > SASI
>     >
>     > was
>     >
>     > the
>     >
>     > huge
>     >
>     > space required for index structure when using
>     >
>     > CONTAINS
>     >
>     > logic
>     >
>     > because
>     >
>     > of
>     >
>     > the
>     >
>     > decomposition of text columns into n-grams. Will
>     >
>     > SAI
>     >
>     > suffer
>     >
>     > from
>     >
>     > the
>     >
>     > same
>     >
>     > issue in future iterations ? I'm anticipating a
>     >
>     > bit
>     >
>     > 3) If I'm querying using SAI and providing
>     >
>     > complete
>     >
>     > partition
>     >
>     > key,
>     >
>     > will
>     >
>     > it
>     >
>     > be more efficient than querying without partition
>     >
>     > key. In
>     >
>     > other
>     >
>     > words,
>     >
>     > does
>     >
>     > SAI provide any optimisation when partition key is
>     >
>     > specified
>     >
>     > ?
>     >
>     > Regards
>     >
>     > Duy Hai DOAN
>     >
>     > Le mar. 18 août 2020 à 11:39, Mick Semb Wever <
>     >
>     > m...@apache.org>
>     >
>     > a
>     >
>     > écrit :
>     >
>     > We are looking forward to the community's
>     >
>     > feedback
>     >
>     > and
>     >
>     > suggestions.
>     >
>     > What comes immediately to mind is testing
>     >
>     > requirements. It
>     >
>     > has
>     >
>     > been
>     >
>     > mentioned already that the project's testability
>     >
>     > and QA
>     >
>     > guidelines
>     >
>     > are
>     >
>     > inadequate to successfully introduce new
>     >
>     > features
>     >
>     > and
>     >
>     > refactorings
>     >
>     > to
>     >
>     > the
>     >
>     > codebase. During the 4.0 beta phase this was
>     >
>     > intended
>     >
>     > to be
>     >
>     > addressed,
>     >
>     > i.e.
>     >
>     > defining more specific QA guidelines for 4.0-rc.
>     >
>     > This
>     >
>     > would
>     >
>     > be
>     >
>     > an
>     >
>     > important
>     >
>     > step towards QA guidelines for all changes and
>     >
>     > CEPs
>     >
>     > post-4.0.
>     >
>     > Questions from me
>     >
>     > - How will this be tested, how will its QA
>     >
>     > status and
>     >
>     > lifecycle
>     >
>     > be
>     >
>     > defined? (per above)
>     >
>     > - With existing C* code needing to be changed,
>     >
>     > what
>     >
>     > is the
>     >
>     > proposed
>     >
>     > plan
>     >
>     > for making those changes ensuring maintained QA,
>     >
>     > e.g.
>     >
>     > is
>     >
>     > there
>     >
>     > separate
>     >
>     > QA
>     >
>     > cycles planned for altering the SPI before
>     >
>     > adding
>     >
>     > a
>     >
>     > new SPI
>     >
>     > implementation?
>     >
>     > - Despite being out of scope, it would be nice
>     >
>     > to have
>     >
>     > some
>     >
>     > idea
>     >
>     > from
>     >
>     > the
>     >
>     > CEP author of when users might still choose
>     >
>     > afresh 2i
>     >
>     > or
>     >
>     > SASI
>     >
>     > over
>     >
>     > SAI,
>     >
>     > - Who fills the roles involved? Who are the
>     >
>     > contributors
>     >
>     > in
>     >
>     > this
>     >
>     > DataStax
>     >
>     > team? Who is the shepherd? Are there other
>     >
>     > stakeholders
>     >
>     > willing
>     >
>     > to
>     >
>     > be
>     >
>     > involved?
>     >
>     > - Is there a preference to use gdoc instead of
>     >
>     > the
>     >
>     > project's
>     >
>     > wiki,
>     >
>     > and
>     >
>     > why? (the CEP process suggest a wiki page, and
>     >
>     > feedback on
>     >
>     > why
>     >
>     > another
>     >
>     > approach is considered better helps evolve the
>     >
>     > CEP
>     >
>     > process
>     >
>     > itself)
>     >
>     > cheers,
>     >
>     > Mick
>     >
>     > ---------------------------------------------------------------------
>     >
>     > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
>     > additional commands, e-mail: dev-h...@cassandra.apache.org
>     >
>     >
> --------------------------------------------------------------------- To
>     > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
> additional
>     > commands, e-mail: dev-h...@cassandra.apache.org
>     >
>     > --
>     > alex p
>     >
>     >
> --------------------------------------------------------------------- To
>     > unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
> additional
>     > commands, e-mail: dev-h...@cassandra.apache.org
>     >
>     >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-7 Storage Attached Index

Reply via email to