Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Jasonstack Zhao Yang
>> I think CEP should be more upfront with "eventually replace
>>  it" bit, since it raises the question about what the people who are
using
>> other index implementations can expect.

Will update the CEP to emphasize: SAI will replace other indexes.

>> Unfortunately, I do not have an
>> implementation sitting around for a direct comparison, but I can imagine
>> situations when B-Trees may perform better because of simpler
construction.
>> Maybe we should even consider prototyping a prefix B-Tree to have a more
>> fair comparison.

As long as prefix BTree supports range/prefix aggregation (which is used to
speed up
range/prefix query when matching entire subtree), we can plug it in and
compare. It won't
affect the CEP design which focuses on sharing data across indexes and
posting aggregation.

But for improving overall index read performance, I think improving base
table read perf
 (because SAI/SASI executes LOTS of SinglePartitionReadCommand after
searching on-disk index)
is more effective than switching from Trie to Prefix BTree.



On Thu, 24 Sep 2020 at 05:33, Benedict Elliott Smith 
wrote:

> FWIW, I personally look forward to receiving that contribution when the
> time is right.
>
> On 23/09/2020, 18:45, "Josh McKenzie"  wrote:
>
> talking about that would involve some bits of information DataStax
> might
> not be ready to share?
>
> At the risk of derailing, I've been poking and prodding this week at we
> contributors at DS getting our act together w/a draft CEP for donating
> the
> trie-based indices to the ASF project.
>
> More to come; the intention is certainly to contribute that code. The
> lack
> of a destination to merge it into (i.e. no 5.0-dev branch) is removing
> significant urgency from the process as well (not to open a 3rd
> Pandora's
> box), but there's certainly an interrelatedness to the conversations
> going
> on.
>
> ---
> Josh McKenzie
>
>
> Sent via Superhuman 
>
>
> On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe <
> calebrackli...@gmail.com>
> wrote:
>
> > As long as we can construct the on-disk indexes efficiently/directly
> from
> > a Memtable-attached index on flush, there's room to try other data
> > structures. Most of the innovation in SAI is around the layout of
> postings
> > (something we can expand on if people are interested) and having a
> > natively row-oriented design that scales w/ multiple indexed columns
> on
> > single SSTables. There are some broader implications of using the
> trie that
> > reach outside SAI itself, but talking about that would involve some
> bits of
> > information DataStax might not be ready to share?
> >
> > On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < jeremiah.jordan@
> > gmail.com> wrote:
> >
> > Short question: looking forward, how are we going to maintain three
> 2i
> > implementations: SASI, SAI, and 2i?
> >
> > I think one of the goals stated in the CEP is for SAI to have parity
> with
> > 2i such that it could eventually replace it.
> >
> > On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
> >
> > oleksandr.pet...@gmail.com> wrote:
> >
> > Short question: looking forward, how are we going to maintain three
> 2i
> > implementations: SASI, SAI, and 2i?
> >
> > Another thing I think this CEP is missing is rationale and motivation
> > about why trie-based indexes were chosen over, say, B-Tree. We did
> have a
> > short discussion about this on Slack, but both arguments that I've
> heard
> > (space-saving and keeping a small subset of nodes in memory) work
> only
> >
> > for
> >
> > the most primitive implementation of a B-Tree. Fully-occupied prefix
> >
> > B-Tree
> >
> > can have similar properties. There's been a lot of research on
> B-Trees
> >
> > and
> >
> > optimisations in those. Unfortunately, I do not have an
> implementation
> > sitting around for a direct comparison, but I can imagine situations
> when
> > B-Trees may perform better because of simpler
> >
> > construction.
> >
> > Maybe we should even consider prototyping a prefix B-Tree to have a
> more
> > fair comparison.
> >
> > Thank you,
> > -- Alex
> >
> > On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang <
> jasonstack.zhao@
> > gmail.com> wrote:
> >
> > Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7
> >
> > SAI.
> >
> > The recorded video is available here:
> >
> > https://cwiki.apache.org/confluence/display/CASSANDRA/
> > 2020-09-01+Apache+Cassandra+Contributor+Meeting
> >
> > On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang <
> jasonstack.zhao@gmail.
> > com>
> > wrote:
> >
> > Thank you, Charles and Patrick
> >
> > On Tue, 1 Sep 2020 at 04:56, 

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Benedict Elliott Smith
FWIW, I personally look forward to receiving that contribution when the time is 
right.

On 23/09/2020, 18:45, "Josh McKenzie"  wrote:

talking about that would involve some bits of information DataStax might
not be ready to share?

At the risk of derailing, I've been poking and prodding this week at we
contributors at DS getting our act together w/a draft CEP for donating the
trie-based indices to the ASF project.

More to come; the intention is certainly to contribute that code. The lack
of a destination to merge it into (i.e. no 5.0-dev branch) is removing
significant urgency from the process as well (not to open a 3rd Pandora's
box), but there's certainly an interrelatedness to the conversations going
on.

---
Josh McKenzie


Sent via Superhuman 


On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe 
wrote:

> As long as we can construct the on-disk indexes efficiently/directly from
> a Memtable-attached index on flush, there's room to try other data
> structures. Most of the innovation in SAI is around the layout of postings
> (something we can expand on if people are interested) and having a
> natively row-oriented design that scales w/ multiple indexed columns on
> single SSTables. There are some broader implications of using the trie 
that
> reach outside SAI itself, but talking about that would involve some bits 
of
> information DataStax might not be ready to share?
>
> On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < jeremiah.jordan@
> gmail.com> wrote:
>
> Short question: looking forward, how are we going to maintain three 2i
> implementations: SASI, SAI, and 2i?
>
> I think one of the goals stated in the CEP is for SAI to have parity with
> 2i such that it could eventually replace it.
>
> On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
>
> oleksandr.pet...@gmail.com> wrote:
>
> Short question: looking forward, how are we going to maintain three 2i
> implementations: SASI, SAI, and 2i?
>
> Another thing I think this CEP is missing is rationale and motivation
> about why trie-based indexes were chosen over, say, B-Tree. We did have a
> short discussion about this on Slack, but both arguments that I've heard
> (space-saving and keeping a small subset of nodes in memory) work only
>
> for
>
> the most primitive implementation of a B-Tree. Fully-occupied prefix
>
> B-Tree
>
> can have similar properties. There's been a lot of research on B-Trees
>
> and
>
> optimisations in those. Unfortunately, I do not have an implementation
> sitting around for a direct comparison, but I can imagine situations when
> B-Trees may perform better because of simpler
>
> construction.
>
> Maybe we should even consider prototyping a prefix B-Tree to have a more
> fair comparison.
>
> Thank you,
> -- Alex
>
> On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang < jasonstack.zhao@
> gmail.com> wrote:
>
> Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7
>
> SAI.
>
> The recorded video is available here:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-09-01+Apache+Cassandra+Contributor+Meeting
>
> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang < jasonstack.zhao@gmail.
> com>
> wrote:
>
> Thank you, Charles and Patrick
>
> On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:
>
> Thank you, Patrick!
>
> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
> wrote:
>
> I just moved it to 8AM for this meeting to better accommodate APAC.
>
> Please
>
> see the update here:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>
> Patrick
>
> On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
>
> wrote:
>
> Patrick,
>
> 11AM PST is a bad time for the people in the APAC timezone. Can we move it
> to 7 or 8AM PST in the morning to accommodate their needs ?
>
> ~Charles
>
> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin 
> wrote:
>
> Meeting scheduled.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>
> Tuesday September 1st, 11AM PST. I added a basic bullet for the
>
> agenda
>
> but
>
> if there is more, edit away.
>
> Patrick
>
> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang < jasonstack.zhao@
> gmail.com> wrote:
>
> +1
>
> On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
>
> e.dimitr...@gmail.com>
>
> wrote:
>
> +1
>
> On Wed, 26 Aug 2020 at 16:48, 

TLS protocol configuration for secure internode messaging needs improvement before the final 4.0 release.

2020-09-23 Thread Jon Meredith
tl;dr Setting encryption_options.protocol does not control which TLS
protocols are accepted, only restricting cipher_suites by protocol
does and I think we should fix encryption_options.protocol to actually
restrict, and have a proposal to do so at the end of the email.


I've been investigating restricting the TLS protocols to prevent use
of TLSv1 & TLSv1.1 for secure internode messaging and streaming
connections and think the current implementation needs improvement
before the final 4.0 release.

The Apache Cassandra documentation page on security
https://cassandra.apache.org/doc/latest/operating/security.html
mentions

"...  the JVM defaults for supported protocols and cipher suites are
used when encryption is enabled. These can be overidden using the
settings in cassandra.yaml, but this is not recommended unless there
are policies in place which dictate certain settings or a need to
disable vulnerable ciphers or protocols in cases where the JVM cannot
be updated."

The implication to me there is that the preferred mechanism is to
configure the JSSE subsystem. Trawling through documentation, the
operator can disable older TLS protocol at the JVM level by creating
new security properties file

$ cat conf/cassandra-security.properties
jdk.tls.disabledAlgorithms=SSLv3, RC4, DES, MD5withRSA, DH keySize < 1024, \
EC keySize < 224, 3DES_EDE_CBC, anon, NULL, TLSv1, TLSv1.1

And appending to the current security properties using

  -Djava.security.properties=conf/cassandra-security.properties.

This works fine pre-4.0, however the introduction of Netty tcnative
which uses OpenSSL under the hood, does not use the
java.security.properties to restrict anything. Neither does it
implement the calls for supporting the OpenSSL configuration file. It
only seems possible to restrict the protocol & ciphers through the
Netty SSLContext API. It is possible to disable OpenSSL by setting the
Java system property cassandra.disable_tcactive_openssl=true, but it
seems undesirable to lose the performance benefit there.

Looking in cassandra.yaml, under 'More advanced defaults' there is a
‘protocol' setting, which an operator might expect restricts which TLS
protocols are accepted.

# More advanced defaults:
# protocol: TLS

However, setting that to TLSv1.2 had no effect on the protocols the
server accepted. Running "openssl s_client -tlsv1 -connect
127.0.0.1:7000" will connect without issue and negotiate a TLSv1.0
session.

I found two previous tickets that addressed TLS protocols, first
explicitly hard-coding the accepted TLS protocols to disable SSLv3
(due to POODLE) in CASSANDRA-8265 /
b93f48a5db321bf7c9fb55a800ed6ab2d6f6b102, and then rely back on Java8
defaults in CASSANDRA-10508 / e4a0a4bf65a87c3aabae4ee0cc35009879e2d455
once they were fixed.

CASSANDRA-10508 mentions the ‘protocol' field as a mechanism for
specifying the protocol, however according to Java docs, that only
verifies the protocol is to the SSL engine supported, and does not
restrict negotiation to using it, as the openssl s_client test
demonstrates.

>From a quick search of the internet, a couple of blog posts recommend
setting the cipher suite to only TLSv1.2 valid ciphers and I can
confirm that does work, leading to this being logged (at ERROR).

ERROR [Messaging-EventLoop-3-2] 2020-09-19T16:17:48,023 : - Failed to
properly handshake with peer /127.0.0.1:33826. Closing the channel.
io.netty.handler.codec.DecoderException:
javax.net.ssl.SSLHandshakeException: Client requested protocol TLSv1.1
is not enabled or supported in server context
Caused by: javax.net.ssl.SSLHandshakeException: Client requested
protocol TLSv1.1 is not enabled or supported in server context

While it does work to restrict the protocol, if we start logging the
accepted protocols the log will show that the server will negotiate
TLS1/TLS1.1 which may get flagged by anybody validating the operators
secure connection configuration.

I also discovered that if SSL is misconfigured (ciphers, keystone,
truststore etc), the node will start up happily but be unable to
accept or make any secure internode connections.

The current state of the code and documentation is unsatisfactory to
me.  We should at least improve the documentation to give clear
guidance to operators on how they can secure their systems under
4.0/tcnative, however I think we should go further and make the
encryption_option.protocol field behave as intended.

Here's my proposal:

1) Interpret the current protocol string as a comma separated list of
protocols that are accepted. Replace the default
EncryptionOptions.protocol of "TLS" with null.
2) If protocol is non-null, call SslContextBuilder.protocols() with
the configured protocols in
org.apache.cassandra.security.SSLFactory#createNettySslContext
3) Special case the protocol configuration "TLS" to mean {"TLSv1",
"TLSv1.1", "TLSv1.2”} for users that have uncommented the default
value. Passing “TLS” is invalid in the protocols() call.
4) Hard-code 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Benedict Elliott Smith
Perhaps it helps to widen the field of discussion to the dev list?

It might help if each of the stakeholder organisations state their view on the 
situation, including why they would or would not support a given 
approach/operator, and what (preferably specific) circumstances might lead them 
to change their mind?

I realise there are meeting logs, but getting a wider discourse with 
non-stakeholder input might help to build a community consensus?  It doesn't 
seem like it can hurt at this point, anyway.


On 23/09/2020, 17:13, "John Sanda"  wrote:

I want to point out that pretty much everything being  discussed in this
thread has been discussed at length during the SIG meetings. I think it is
worth noting because we are pretty much still have the same conversation.

On Wed, Sep 23, 2020 at 12:03 PM Benedict Elliott Smith 

wrote:

> I don't think there's anything about a code drop that's not "The Apache
> Way"
>
> If there's a consensus (or even strong majority) amongst invested parties,
> I don't see why we could not adopt an operator directly into the project.
>
> It's possible a green field approach might lead to fewer hard feelings, as
> everyone is in the same boat. Perhaps all operators are also suboptimal 
and
> could be improved with a rewrite? But I think coordinating a lot of
> different entities around an empty codebase is particularly challenging.  
I
> actually think it could be better for cohesion and collaboration to have a
> suboptimal but substantive starting point.
>
>
> On 23/09/2020, 16:11, "Stefan Miklosovic" <
> stefan.mikloso...@instaclustr.com> wrote:
>
> I think that from Instaclustr it was stated quite clearly multiple
> times that we are "fine to throw it away" if there is something better
> and more wide-spread.Indeed, we have invested a lot of time in the
> operator but it was not useless at all, we gained a lot of quite
> unique knowledge how to put all pieces together. However, I think that
> this space is going to be quite fragmented and "balkanized", which is
> not always a bad thing, but in a quite narrow area as Kubernetes
> operator is, I just do not see how 4 operators are going to be
> beneficial for ordinary people ("official" from community, ours,
> Datastax one and CassKop (without any significant order)). Sure,
> innovation and healthy competition is important but to what extent ...
> One can start a Cassandra cluster on Kubernetes just so many times
> differently and nobody really likes a vendor lock-in. People wanting
> to run a cluster on K8S realise that there are three operators, each
> backed by a private business entity, and the community operator is not
> there ... Huh, interesting ... One may even start to question what is
> wrong with these folks that it takes three companies to build their
> own solution.
>
> Having said that, to my perception, Cassandra community just does not
> have enough engineers nor contributors to keep 4 operators alive at
> the same time (I wish I was wrong) so the idea of selecting the best
> one or to merge obvious things and approaches together is
> understandable, even if it meant we eventually sunset ours. In
> addition, nobody from big players is going to contribute to the code
> base of the other one, for obvious reasons, so channeling and
> directing this effort into something common for a community seems to
> be the only reasonable way of cooperation.
>
> It is quite hard to bootstrap this if the donation of the code in big
> chunks / whole repo is out of question as it is not the "Apache way"
> (there was some thread running here about this in more depth a while
> ago) and we basically need to start from scratch which is quite
> demotivating, we are just inventing the wheel and nobody is up to it.
> It is like people are waiting for that to happen so they can jump in
> "once it is the thing" but it will never materialise or at least the
> hurdle to kick it off is unnecessarily high. Nobody is going to invest
> in this heavily if there is already a working operator from companies
> mentioned above. As I understood it, one reason of not choosing the
> way of donating it all is that "the learning and community building
> should happen in organic manner and we just can not accept the
> donation", but is not it true that it is easier to build a community
> around something which is already there rather than trying to build it
> around an idea which is quite hard to dedicate to?
>
> On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie 
> wrote:
> >
> > > I think there's 

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Josh McKenzie
talking about that would involve some bits of information DataStax might
not be ready to share?

At the risk of derailing, I've been poking and prodding this week at we
contributors at DS getting our act together w/a draft CEP for donating the
trie-based indices to the ASF project.

More to come; the intention is certainly to contribute that code. The lack
of a destination to merge it into (i.e. no 5.0-dev branch) is removing
significant urgency from the process as well (not to open a 3rd Pandora's
box), but there's certainly an interrelatedness to the conversations going
on.

---
Josh McKenzie


Sent via Superhuman 


On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe 
wrote:

> As long as we can construct the on-disk indexes efficiently/directly from
> a Memtable-attached index on flush, there's room to try other data
> structures. Most of the innovation in SAI is around the layout of postings
> (something we can expand on if people are interested) and having a
> natively row-oriented design that scales w/ multiple indexed columns on
> single SSTables. There are some broader implications of using the trie that
> reach outside SAI itself, but talking about that would involve some bits of
> information DataStax might not be ready to share?
>
> On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < jeremiah.jordan@
> gmail.com> wrote:
>
> Short question: looking forward, how are we going to maintain three 2i
> implementations: SASI, SAI, and 2i?
>
> I think one of the goals stated in the CEP is for SAI to have parity with
> 2i such that it could eventually replace it.
>
> On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
>
> oleksandr.pet...@gmail.com> wrote:
>
> Short question: looking forward, how are we going to maintain three 2i
> implementations: SASI, SAI, and 2i?
>
> Another thing I think this CEP is missing is rationale and motivation
> about why trie-based indexes were chosen over, say, B-Tree. We did have a
> short discussion about this on Slack, but both arguments that I've heard
> (space-saving and keeping a small subset of nodes in memory) work only
>
> for
>
> the most primitive implementation of a B-Tree. Fully-occupied prefix
>
> B-Tree
>
> can have similar properties. There's been a lot of research on B-Trees
>
> and
>
> optimisations in those. Unfortunately, I do not have an implementation
> sitting around for a direct comparison, but I can imagine situations when
> B-Trees may perform better because of simpler
>
> construction.
>
> Maybe we should even consider prototyping a prefix B-Tree to have a more
> fair comparison.
>
> Thank you,
> -- Alex
>
> On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang < jasonstack.zhao@
> gmail.com> wrote:
>
> Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7
>
> SAI.
>
> The recorded video is available here:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-09-01+Apache+Cassandra+Contributor+Meeting
>
> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang < jasonstack.zhao@gmail.
> com>
> wrote:
>
> Thank you, Charles and Patrick
>
> On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:
>
> Thank you, Patrick!
>
> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
> wrote:
>
> I just moved it to 8AM for this meeting to better accommodate APAC.
>
> Please
>
> see the update here:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>
> Patrick
>
> On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
>
> wrote:
>
> Patrick,
>
> 11AM PST is a bad time for the people in the APAC timezone. Can we move it
> to 7 or 8AM PST in the morning to accommodate their needs ?
>
> ~Charles
>
> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin 
> wrote:
>
> Meeting scheduled.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>
> Tuesday September 1st, 11AM PST. I added a basic bullet for the
>
> agenda
>
> but
>
> if there is more, edit away.
>
> Patrick
>
> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang < jasonstack.zhao@
> gmail.com> wrote:
>
> +1
>
> On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
>
> e.dimitr...@gmail.com>
>
> wrote:
>
> +1
>
> On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
>
> calebrackli...@gmail.com>
>
> wrote:
>
> +1
>
> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
>
> pmcfa...@gmail.com>
>
> wrote:
>
> This is related to the discussion Jordan and I had about
>
> the
>
> contributor
>
> Zoom call. Instead of open mic for any issue, call it
>
> based
>
> on a
>
> discussion
>
> thread or threads for higher bandwidth discussion.
>
> I would be happy to schedule on for next week to
>
> specifically
>
> discuss
>
> CEP-7. I can attach the recorded call to the CEP after.
>
> +1 or -1?
>
> Patrick
>
> On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie <
>
> jmcken...@apache.org>
>
> wrote:
>
> Does community plan to open another discussion or CEP
>
> on
>
> 

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Caleb Rackliffe
As long as we can construct the on-disk indexes efficiently/directly from a
Memtable-attached index on flush, there's room to try other data
structures. Most of the innovation in SAI is around the layout of postings
(something we can expand on if people are interested) and having a natively
row-oriented design that scales w/ multiple indexed columns on single
SSTables. There are some broader implications of using the trie that reach
outside SAI itself, but talking about that would involve some bits of
information DataStax might not be ready to share?

On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:

> > Short question: looking forward, how are we going to maintain three 2i
> > implementations: SASI, SAI, and 2i?
>
> I think one of the goals stated in the CEP is for SAI to have parity with
> 2i such that it could eventually replace it.
>
>
> > On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
> oleksandr.pet...@gmail.com> wrote:
> >
> > Short question: looking forward, how are we going to maintain three 2i
> > implementations: SASI, SAI, and 2i?
> >
> > Another thing I think this CEP is missing is rationale and motivation
> > about why trie-based indexes were chosen over, say, B-Tree. We did have a
> > short discussion about this on Slack, but both arguments that I've heard
> > (space-saving and keeping a small subset of nodes in memory) work only
> for
> > the most primitive implementation of a B-Tree. Fully-occupied prefix
> B-Tree
> > can have similar properties. There's been a lot of research on B-Trees
> and
> > optimisations in those. Unfortunately, I do not have an
> > implementation sitting around for a direct comparison, but I can imagine
> > situations when B-Trees may perform better because of simpler
> construction.
> > Maybe we should even consider prototyping a prefix B-Tree to have a more
> > fair comparison.
> >
> > Thank you,
> > -- Alex
> >
> >
> >
> > On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang <
> > jasonstack.z...@gmail.com> wrote:
> >
> >> Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7
> SAI.
> >>
> >> The recorded video is available here:
> >>
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-01+Apache+Cassandra+Contributor+Meeting
> >>
> >> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang <
> >> jasonstack.z...@gmail.com>
> >> wrote:
> >>
> >>> Thank you, Charles and Patrick
> >>>
> >>> On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:
> >>>
>  Thank you, Patrick!
> 
>  On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
>  wrote:
> >
> > I just moved it to 8AM for this meeting to better accommodate APAC.
>  Please
> > see the update here:
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> >
> > Patrick
> >
> > On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
>  wrote:
> >
> >> Patrick,
> >>
> >> 11AM PST is a bad time for the people in the APAC timezone. Can we
> >> move it to 7 or 8AM PST in the morning to accommodate their needs ?
> >>
> >> ~Charles
> >>
> >> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin  >>>
> >> wrote:
> >>>
> >>> Meeting scheduled.
> >>>
> >>
> 
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> >>>
> >>> Tuesday September 1st, 11AM PST. I added a basic bullet for the
>  agenda
> >> but
> >>> if there is more, edit away.
> >>>
> >>> Patrick
> >>>
> >>> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
> >>> jasonstack.z...@gmail.com> wrote:
> >>>
>  +1
> 
>  On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
> >> e.dimitr...@gmail.com>
>  wrote:
> 
> > +1
> >
> > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
> >> calebrackli...@gmail.com>
> > wrote:
> >
> >> +1
> >>
> >>
> >>
> >> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
>  pmcfa...@gmail.com>
> > wrote:
> >>
> >>
> >>
> >>> This is related to the discussion Jordan and I had about
> >> the
> > contributor
> >>
> >>> Zoom call. Instead of open mic for any issue, call it
> >> based
>  on a
> >> discussion
> >>
> >>> thread or threads for higher bandwidth discussion.
> >>
> >>>
> >>
> >>> I would be happy to schedule on for next week to
>  specifically
> >> discuss
> >>
> >>> CEP-7. I can attach the recorded call to the CEP after.
> >>
> >>>
> >>
> >>> +1 or -1?
> >>
> >>>
> >>
> >>> Patrick
> >>
> >>>
> >>
> >>> On Tue, Aug 25, 2020 at 7:03 AM Joshua 

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Oleksandr Petrov
I did see a bit about "future parity and beyond" which is more or less an
obvious goal. I think CEP should be more upfront with "eventually replace
it" bit, since it raises the question about what the people who are using
other index implementations can expect.

On Wed, Sep 23, 2020 at 6:00 PM Jeremiah D Jordan 
wrote:

> > Short question: looking forward, how are we going to maintain three 2i
> > implementations: SASI, SAI, and 2i?
>
> I think one of the goals stated in the CEP is for SAI to have parity with
> 2i such that it could eventually replace it.
>
>
> > On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
> oleksandr.pet...@gmail.com> wrote:
> >
> > Short question: looking forward, how are we going to maintain three 2i
> > implementations: SASI, SAI, and 2i?
> >
> > Another thing I think this CEP is missing is rationale and motivation
> > about why trie-based indexes were chosen over, say, B-Tree. We did have a
> > short discussion about this on Slack, but both arguments that I've heard
> > (space-saving and keeping a small subset of nodes in memory) work only
> for
> > the most primitive implementation of a B-Tree. Fully-occupied prefix
> B-Tree
> > can have similar properties. There's been a lot of research on B-Trees
> and
> > optimisations in those. Unfortunately, I do not have an
> > implementation sitting around for a direct comparison, but I can imagine
> > situations when B-Trees may perform better because of simpler
> construction.
> > Maybe we should even consider prototyping a prefix B-Tree to have a more
> > fair comparison.
> >
> > Thank you,
> > -- Alex
> >
> >
> >
> > On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang <
> > jasonstack.z...@gmail.com> wrote:
> >
> >> Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7
> SAI.
> >>
> >> The recorded video is available here:
> >>
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-01+Apache+Cassandra+Contributor+Meeting
> >>
> >> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang <
> >> jasonstack.z...@gmail.com>
> >> wrote:
> >>
> >>> Thank you, Charles and Patrick
> >>>
> >>> On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:
> >>>
>  Thank you, Patrick!
> 
>  On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
>  wrote:
> >
> > I just moved it to 8AM for this meeting to better accommodate APAC.
>  Please
> > see the update here:
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> >
> > Patrick
> >
> > On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
>  wrote:
> >
> >> Patrick,
> >>
> >> 11AM PST is a bad time for the people in the APAC timezone. Can we
> >> move it to 7 or 8AM PST in the morning to accommodate their needs ?
> >>
> >> ~Charles
> >>
> >> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin  >>>
> >> wrote:
> >>>
> >>> Meeting scheduled.
> >>>
> >>
> 
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> >>>
> >>> Tuesday September 1st, 11AM PST. I added a basic bullet for the
>  agenda
> >> but
> >>> if there is more, edit away.
> >>>
> >>> Patrick
> >>>
> >>> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
> >>> jasonstack.z...@gmail.com> wrote:
> >>>
>  +1
> 
>  On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
> >> e.dimitr...@gmail.com>
>  wrote:
> 
> > +1
> >
> > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
> >> calebrackli...@gmail.com>
> > wrote:
> >
> >> +1
> >>
> >>
> >>
> >> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
>  pmcfa...@gmail.com>
> > wrote:
> >>
> >>
> >>
> >>> This is related to the discussion Jordan and I had about
> >> the
> > contributor
> >>
> >>> Zoom call. Instead of open mic for any issue, call it
> >> based
>  on a
> >> discussion
> >>
> >>> thread or threads for higher bandwidth discussion.
> >>
> >>>
> >>
> >>> I would be happy to schedule on for next week to
>  specifically
> >> discuss
> >>
> >>> CEP-7. I can attach the recorded call to the CEP after.
> >>
> >>>
> >>
> >>> +1 or -1?
> >>
> >>>
> >>
> >>> Patrick
> >>
> >>>
> >>
> >>> On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie <
>  jmcken...@apache.org>
> >>
> >>> wrote:
> >>
> >>>
> >>
> >
> >>
> > Does community plan to open another discussion or CEP
> >> on
> >>
> >>> modularization?
> >>
> 
> >>
>  We 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread John Sanda
I want to point out that pretty much everything being  discussed in this
thread has been discussed at length during the SIG meetings. I think it is
worth noting because we are pretty much still have the same conversation.

On Wed, Sep 23, 2020 at 12:03 PM Benedict Elliott Smith 
wrote:

> I don't think there's anything about a code drop that's not "The Apache
> Way"
>
> If there's a consensus (or even strong majority) amongst invested parties,
> I don't see why we could not adopt an operator directly into the project.
>
> It's possible a green field approach might lead to fewer hard feelings, as
> everyone is in the same boat. Perhaps all operators are also suboptimal and
> could be improved with a rewrite? But I think coordinating a lot of
> different entities around an empty codebase is particularly challenging.  I
> actually think it could be better for cohesion and collaboration to have a
> suboptimal but substantive starting point.
>
>
> On 23/09/2020, 16:11, "Stefan Miklosovic" <
> stefan.mikloso...@instaclustr.com> wrote:
>
> I think that from Instaclustr it was stated quite clearly multiple
> times that we are "fine to throw it away" if there is something better
> and more wide-spread.Indeed, we have invested a lot of time in the
> operator but it was not useless at all, we gained a lot of quite
> unique knowledge how to put all pieces together. However, I think that
> this space is going to be quite fragmented and "balkanized", which is
> not always a bad thing, but in a quite narrow area as Kubernetes
> operator is, I just do not see how 4 operators are going to be
> beneficial for ordinary people ("official" from community, ours,
> Datastax one and CassKop (without any significant order)). Sure,
> innovation and healthy competition is important but to what extent ...
> One can start a Cassandra cluster on Kubernetes just so many times
> differently and nobody really likes a vendor lock-in. People wanting
> to run a cluster on K8S realise that there are three operators, each
> backed by a private business entity, and the community operator is not
> there ... Huh, interesting ... One may even start to question what is
> wrong with these folks that it takes three companies to build their
> own solution.
>
> Having said that, to my perception, Cassandra community just does not
> have enough engineers nor contributors to keep 4 operators alive at
> the same time (I wish I was wrong) so the idea of selecting the best
> one or to merge obvious things and approaches together is
> understandable, even if it meant we eventually sunset ours. In
> addition, nobody from big players is going to contribute to the code
> base of the other one, for obvious reasons, so channeling and
> directing this effort into something common for a community seems to
> be the only reasonable way of cooperation.
>
> It is quite hard to bootstrap this if the donation of the code in big
> chunks / whole repo is out of question as it is not the "Apache way"
> (there was some thread running here about this in more depth a while
> ago) and we basically need to start from scratch which is quite
> demotivating, we are just inventing the wheel and nobody is up to it.
> It is like people are waiting for that to happen so they can jump in
> "once it is the thing" but it will never materialise or at least the
> hurdle to kick it off is unnecessarily high. Nobody is going to invest
> in this heavily if there is already a working operator from companies
> mentioned above. As I understood it, one reason of not choosing the
> way of donating it all is that "the learning and community building
> should happen in organic manner and we just can not accept the
> donation", but is not it true that it is easier to build a community
> around something which is already there rather than trying to build it
> around an idea which is quite hard to dedicate to?
>
> On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie 
> wrote:
> >
> > > I think there's significant value to the community in trying to
> coalesce
> > on a single approach,
> > I agree. Unfortunately in this case, the parties with a vested
> interest and
> > written operators came to the table and couldn't agree to coalesce
> on a
> > single approach. John Sanda attempted to start an initiative to
> write a
> > best-of-breed combining choice parts of each operator, but that
> effort did
> > not gain traction.
> >
> > Which is where my hypothesis comes from that if there were a clear
> "better
> > fit" operator to start from we wouldn't be in a deadlock; the correct
> > choice would be obvious. Reasonably so, every engineer that's written
> > something is going to want that something to be used and not thrown
> away in
> > favor of another something without strong evidence 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread John Sanda
I W

On Wed, Sep 23, 2020 at 12:03 PM Benedict Elliott Smith 
wrote:

> I don't think there's anything about a code drop that's not "The Apache
> Way"
>
> If there's a consensus (or even strong majority) amongst invested parties,
> I don't see why we could not adopt an operator directly into the project.
>
> It's possible a green field approach might lead to fewer hard feelings, as
> everyone is in the same boat. Perhaps all operators are also suboptimal and
> could be improved with a rewrite? But I think coordinating a lot of
> different entities around an empty codebase is particularly challenging.  I
> actually think it could be better for cohesion and collaboration to have a
> suboptimal but substantive starting point.
>
>
> On 23/09/2020, 16:11, "Stefan Miklosovic" <
> stefan.mikloso...@instaclustr.com> wrote:
>
> I think that from Instaclustr it was stated quite clearly multiple
> times that we are "fine to throw it away" if there is something better
> and more wide-spread.Indeed, we have invested a lot of time in the
> operator but it was not useless at all, we gained a lot of quite
> unique knowledge how to put all pieces together. However, I think that
> this space is going to be quite fragmented and "balkanized", which is
> not always a bad thing, but in a quite narrow area as Kubernetes
> operator is, I just do not see how 4 operators are going to be
> beneficial for ordinary people ("official" from community, ours,
> Datastax one and CassKop (without any significant order)). Sure,
> innovation and healthy competition is important but to what extent ...
> One can start a Cassandra cluster on Kubernetes just so many times
> differently and nobody really likes a vendor lock-in. People wanting
> to run a cluster on K8S realise that there are three operators, each
> backed by a private business entity, and the community operator is not
> there ... Huh, interesting ... One may even start to question what is
> wrong with these folks that it takes three companies to build their
> own solution.
>
> Having said that, to my perception, Cassandra community just does not
> have enough engineers nor contributors to keep 4 operators alive at
> the same time (I wish I was wrong) so the idea of selecting the best
> one or to merge obvious things and approaches together is
> understandable, even if it meant we eventually sunset ours. In
> addition, nobody from big players is going to contribute to the code
> base of the other one, for obvious reasons, so channeling and
> directing this effort into something common for a community seems to
> be the only reasonable way of cooperation.
>
> It is quite hard to bootstrap this if the donation of the code in big
> chunks / whole repo is out of question as it is not the "Apache way"
> (there was some thread running here about this in more depth a while
> ago) and we basically need to start from scratch which is quite
> demotivating, we are just inventing the wheel and nobody is up to it.
> It is like people are waiting for that to happen so they can jump in
> "once it is the thing" but it will never materialise or at least the
> hurdle to kick it off is unnecessarily high. Nobody is going to invest
> in this heavily if there is already a working operator from companies
> mentioned above. As I understood it, one reason of not choosing the
> way of donating it all is that "the learning and community building
> should happen in organic manner and we just can not accept the
> donation", but is not it true that it is easier to build a community
> around something which is already there rather than trying to build it
> around an idea which is quite hard to dedicate to?
>
> On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie 
> wrote:
> >
> > > I think there's significant value to the community in trying to
> coalesce
> > on a single approach,
> > I agree. Unfortunately in this case, the parties with a vested
> interest and
> > written operators came to the table and couldn't agree to coalesce
> on a
> > single approach. John Sanda attempted to start an initiative to
> write a
> > best-of-breed combining choice parts of each operator, but that
> effort did
> > not gain traction.
> >
> > Which is where my hypothesis comes from that if there were a clear
> "better
> > fit" operator to start from we wouldn't be in a deadlock; the correct
> > choice would be obvious. Reasonably so, every engineer that's written
> > something is going to want that something to be used and not thrown
> away in
> > favor of another something without strong evidence as to why that's
> the
> > better choice.
> >
> > As far as I know, nobody has made a clear case as to a more
> compelling
> > place to start in terms of an operator donation the project then
> > 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Benedict Elliott Smith
I don't think there's anything about a code drop that's not "The Apache Way"

If there's a consensus (or even strong majority) amongst invested parties, I 
don't see why we could not adopt an operator directly into the project.

It's possible a green field approach might lead to fewer hard feelings, as 
everyone is in the same boat. Perhaps all operators are also suboptimal and 
could be improved with a rewrite? But I think coordinating a lot of different 
entities around an empty codebase is particularly challenging.  I actually 
think it could be better for cohesion and collaboration to have a suboptimal 
but substantive starting point.


On 23/09/2020, 16:11, "Stefan Miklosovic"  
wrote:

I think that from Instaclustr it was stated quite clearly multiple
times that we are "fine to throw it away" if there is something better
and more wide-spread.Indeed, we have invested a lot of time in the
operator but it was not useless at all, we gained a lot of quite
unique knowledge how to put all pieces together. However, I think that
this space is going to be quite fragmented and "balkanized", which is
not always a bad thing, but in a quite narrow area as Kubernetes
operator is, I just do not see how 4 operators are going to be
beneficial for ordinary people ("official" from community, ours,
Datastax one and CassKop (without any significant order)). Sure,
innovation and healthy competition is important but to what extent ...
One can start a Cassandra cluster on Kubernetes just so many times
differently and nobody really likes a vendor lock-in. People wanting
to run a cluster on K8S realise that there are three operators, each
backed by a private business entity, and the community operator is not
there ... Huh, interesting ... One may even start to question what is
wrong with these folks that it takes three companies to build their
own solution.

Having said that, to my perception, Cassandra community just does not
have enough engineers nor contributors to keep 4 operators alive at
the same time (I wish I was wrong) so the idea of selecting the best
one or to merge obvious things and approaches together is
understandable, even if it meant we eventually sunset ours. In
addition, nobody from big players is going to contribute to the code
base of the other one, for obvious reasons, so channeling and
directing this effort into something common for a community seems to
be the only reasonable way of cooperation.

It is quite hard to bootstrap this if the donation of the code in big
chunks / whole repo is out of question as it is not the "Apache way"
(there was some thread running here about this in more depth a while
ago) and we basically need to start from scratch which is quite
demotivating, we are just inventing the wheel and nobody is up to it.
It is like people are waiting for that to happen so they can jump in
"once it is the thing" but it will never materialise or at least the
hurdle to kick it off is unnecessarily high. Nobody is going to invest
in this heavily if there is already a working operator from companies
mentioned above. As I understood it, one reason of not choosing the
way of donating it all is that "the learning and community building
should happen in organic manner and we just can not accept the
donation", but is not it true that it is easier to build a community
around something which is already there rather than trying to build it
around an idea which is quite hard to dedicate to?

On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie  wrote:
>
> > I think there's significant value to the community in trying to coalesce
> on a single approach,
> I agree. Unfortunately in this case, the parties with a vested interest 
and
> written operators came to the table and couldn't agree to coalesce on a
> single approach. John Sanda attempted to start an initiative to write a
> best-of-breed combining choice parts of each operator, but that effort did
> not gain traction.
>
> Which is where my hypothesis comes from that if there were a clear "better
> fit" operator to start from we wouldn't be in a deadlock; the correct
> choice would be obvious. Reasonably so, every engineer that's written
> something is going to want that something to be used and not thrown away 
in
> favor of another something without strong evidence as to why that's the
> better choice.
>
> As far as I know, nobody has made a clear case as to a more compelling
> place to start in terms of an operator donation the project then
> collaborates on. There's no mass adoption evidence nor feature enumeration
> that I know of for any of the approaches anyone's taken, so the 
discussions
> remain stalled.
>
>
>
> On Wed, Sep 23, 2020 at 7:18 AM, Benedict Elliott Smith 
 > wrote:
>

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Oleksandr Petrov
Short question: looking forward, how are we going to maintain three 2i
implementations: SASI, SAI, and 2i?

Another thing I think this CEP is missing is rationale and motivation
about why trie-based indexes were chosen over, say, B-Tree. We did have a
short discussion about this on Slack, but both arguments that I've heard
(space-saving and keeping a small subset of nodes in memory) work only for
the most primitive implementation of a B-Tree. Fully-occupied prefix B-Tree
can have similar properties. There's been a lot of research on B-Trees and
optimisations in those. Unfortunately, I do not have an
implementation sitting around for a direct comparison, but I can imagine
situations when B-Trees may perform better because of simpler construction.
Maybe we should even consider prototyping a prefix B-Tree to have a more
fair comparison.

Thank you,
-- Alex



On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang <
jasonstack.z...@gmail.com> wrote:

> Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7 SAI.
>
> The recorded video is available here:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-01+Apache+Cassandra+Contributor+Meeting
>
> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang <
> jasonstack.z...@gmail.com>
> wrote:
>
> > Thank you, Charles and Patrick
> >
> > On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:
> >
> >> Thank you, Patrick!
> >>
> >> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
> >> wrote:
> >> >
> >> > I just moved it to 8AM for this meeting to better accommodate APAC.
> >> Please
> >> > see the update here:
> >> >
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> >> >
> >> > Patrick
> >> >
> >> > On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
> >> wrote:
> >> >
> >> > > Patrick,
> >> > >
> >> > > 11AM PST is a bad time for the people in the APAC timezone. Can we
> >> > > move it to 7 or 8AM PST in the morning to accommodate their needs ?
> >> > >
> >> > > ~Charles
> >> > >
> >> > > On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin  >
> >> > > wrote:
> >> > > >
> >> > > > Meeting scheduled.
> >> > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
> >> > > >
> >> > > > Tuesday September 1st, 11AM PST. I added a basic bullet for the
> >> agenda
> >> > > but
> >> > > > if there is more, edit away.
> >> > > >
> >> > > > Patrick
> >> > > >
> >> > > > On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang <
> >> > > > jasonstack.z...@gmail.com> wrote:
> >> > > >
> >> > > > > +1
> >> > > > >
> >> > > > > On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
> >> > > e.dimitr...@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > +1
> >> > > > > >
> >> > > > > > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe <
> >> > > calebrackli...@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > +1
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <
> >> pmcfa...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > > This is related to the discussion Jordan and I had about
> the
> >> > > > > > contributor
> >> > > > > > >
> >> > > > > > > > Zoom call. Instead of open mic for any issue, call it
> based
> >> on a
> >> > > > > > > discussion
> >> > > > > > >
> >> > > > > > > > thread or threads for higher bandwidth discussion.
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > I would be happy to schedule on for next week to
> >> specifically
> >> > > discuss
> >> > > > > > >
> >> > > > > > > > CEP-7. I can attach the recorded call to the CEP after.
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > +1 or -1?
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > Patrick
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie <
> >> > > > > jmcken...@apache.org>
> >> > > > > > >
> >> > > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > > > >
> >> > > > > > >
> >> > > > > > > > > > Does community plan to open another discussion or CEP
> on
> >> > > > > > >
> >> > > > > > > > modularization?
> >> > > > > > >
> >> > > > > > > > >
> >> > > > > > >
> >> > > > > > > > > We probably should have a discussion on the ML or
> monthly
> >> > > contrib
> >> > > > > > call
> >> > > > > > >
> >> > > > > > > > > about it first to see how aligned the interested
> >> contributors
> >> > > are.
> >> > > > > > > Could
> >> > > > > > >
> >> > > > > > > > do
> >> > > > > > >
> >> > > > > > > > > that through CEP as well but CEP's (at least thus far
> >> sans k8s
> >> > > > > > > operator)
> >> > > > > > >
> >> > > > > > > > > tend to start with a strong, deeply thought out point of
> >> view
> >> > > being
> >> > > > > > >
> >> > > > > > > > > expressed.
> >> > > > > > >
> >> > 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Stefan Miklosovic
I think that from Instaclustr it was stated quite clearly multiple
times that we are "fine to throw it away" if there is something better
and more wide-spread.Indeed, we have invested a lot of time in the
operator but it was not useless at all, we gained a lot of quite
unique knowledge how to put all pieces together. However, I think that
this space is going to be quite fragmented and "balkanized", which is
not always a bad thing, but in a quite narrow area as Kubernetes
operator is, I just do not see how 4 operators are going to be
beneficial for ordinary people ("official" from community, ours,
Datastax one and CassKop (without any significant order)). Sure,
innovation and healthy competition is important but to what extent ...
One can start a Cassandra cluster on Kubernetes just so many times
differently and nobody really likes a vendor lock-in. People wanting
to run a cluster on K8S realise that there are three operators, each
backed by a private business entity, and the community operator is not
there ... Huh, interesting ... One may even start to question what is
wrong with these folks that it takes three companies to build their
own solution.

Having said that, to my perception, Cassandra community just does not
have enough engineers nor contributors to keep 4 operators alive at
the same time (I wish I was wrong) so the idea of selecting the best
one or to merge obvious things and approaches together is
understandable, even if it meant we eventually sunset ours. In
addition, nobody from big players is going to contribute to the code
base of the other one, for obvious reasons, so channeling and
directing this effort into something common for a community seems to
be the only reasonable way of cooperation.

It is quite hard to bootstrap this if the donation of the code in big
chunks / whole repo is out of question as it is not the "Apache way"
(there was some thread running here about this in more depth a while
ago) and we basically need to start from scratch which is quite
demotivating, we are just inventing the wheel and nobody is up to it.
It is like people are waiting for that to happen so they can jump in
"once it is the thing" but it will never materialise or at least the
hurdle to kick it off is unnecessarily high. Nobody is going to invest
in this heavily if there is already a working operator from companies
mentioned above. As I understood it, one reason of not choosing the
way of donating it all is that "the learning and community building
should happen in organic manner and we just can not accept the
donation", but is not it true that it is easier to build a community
around something which is already there rather than trying to build it
around an idea which is quite hard to dedicate to?

On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie  wrote:
>
> > I think there's significant value to the community in trying to coalesce
> on a single approach,
> I agree. Unfortunately in this case, the parties with a vested interest and
> written operators came to the table and couldn't agree to coalesce on a
> single approach. John Sanda attempted to start an initiative to write a
> best-of-breed combining choice parts of each operator, but that effort did
> not gain traction.
>
> Which is where my hypothesis comes from that if there were a clear "better
> fit" operator to start from we wouldn't be in a deadlock; the correct
> choice would be obvious. Reasonably so, every engineer that's written
> something is going to want that something to be used and not thrown away in
> favor of another something without strong evidence as to why that's the
> better choice.
>
> As far as I know, nobody has made a clear case as to a more compelling
> place to start in terms of an operator donation the project then
> collaborates on. There's no mass adoption evidence nor feature enumeration
> that I know of for any of the approaches anyone's taken, so the discussions
> remain stalled.
>
>
>
> On Wed, Sep 23, 2020 at 7:18 AM, Benedict Elliott Smith  > wrote:
>
> > I think there's significant value to the community in trying to coalesce
> > on a single approach, earlier than later. This is an opportunity to expand
> > the number of active organisations involved directly in the Apache
> > Cassandra project, as well as to more quickly expand the project's
> > functionality into an area we consider urgent and important. I think it
> > would be a real shame to waste this opportunity. No doubt it will be hard,
> > as organisations have certain built-in investments in their own approaches.
> >
> > I haven't participated in these calls as I do not consider myself to have
> > the relevant experience and expertise, and have other focuses on the
> > project. I just wanted to voice a vote in favour of trying to bring the
> > different organisations together on a single approach if possible. Is there
> > anything the project can do to help this happen?
> >
> > On 23/09/2020, 03:04, "Ben Bromhead"  wrote:
> >
> > I think there is 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread franck.dehay
I can explain quite a bit of the history of why we are in this situation today 
if you want but the important question is:
Who is willing to donate its operator and the control over its future to the 
community?
- Orange does with CassKop, as soon as we release v1 quite soon.
- who else? and when?

Then we can compare the features :)


> On 23 Sep 2020, at 15:27, Joshua McKenzie  wrote:
> 
>> I think there's significant value to the community in trying to coalesce
> on a single approach,
> I agree. Unfortunately in this case, the parties with a vested interest and
> written operators came to the table and couldn't agree to coalesce on a
> single approach. John Sanda attempted to start an initiative to write a
> best-of-breed combining choice parts of each operator, but that effort did
> not gain traction.
> 
> Which is where my hypothesis comes from that if there were a clear "better
> fit" operator to start from we wouldn't be in a deadlock; the correct
> choice would be obvious. Reasonably so, every engineer that's written
> something is going to want that something to be used and not thrown away in
> favor of another something without strong evidence as to why that's the
> better choice.
> 
> As far as I know, nobody has made a clear case as to a more compelling
> place to start in terms of an operator donation the project then
> collaborates on. There's no mass adoption evidence nor feature enumeration
> that I know of for any of the approaches anyone's taken, so the discussions
> remain stalled.
> 
> 
> 
> On Wed, Sep 23, 2020 at 7:18 AM, Benedict Elliott Smith > wrote:
> 
>> I think there's significant value to the community in trying to coalesce
>> on a single approach, earlier than later. This is an opportunity to expand
>> the number of active organisations involved directly in the Apache
>> Cassandra project, as well as to more quickly expand the project's
>> functionality into an area we consider urgent and important. I think it
>> would be a real shame to waste this opportunity. No doubt it will be hard,
>> as organisations have certain built-in investments in their own approaches.
>> 
>> I haven't participated in these calls as I do not consider myself to have
>> the relevant experience and expertise, and have other focuses on the
>> project. I just wanted to voice a vote in favour of trying to bring the
>> different organisations together on a single approach if possible. Is there
>> anything the project can do to help this happen?
>> 
>> On 23/09/2020, 03:04, "Ben Bromhead"  wrote:
>> 
>> I think there is certainly an appetite to donate and standardise on a
>> given operator (as mentioned in this thread).
>> 
>> I personally found the SIG hard to participate in due to time zones and
>> the synchronous nature of it.
>> 
>> So while it was a great forum to dive into certain details for a subset of
>> participants and a worthwhile endeavour, I wouldn't paint it as an accurate
>> reflection of community intent.
>> 
>> I don't think that any participants want to continue down the path of "let
>> a thousand flowers bloom". That's why we are looking towards CasKop (as
>> well as a number of technical reasons).
>> 
>> Some of the recorded meetings and outputs can also be found if you are
>> interested in some primary sources
>> https://cwiki.apache.org/confluence/display/CASSANDRA/
>> Cassandra+Kubernetes+Operator+SIG
>> .
>> 
>> From what I understand second-hand from talking to people on the SIG
>> calls,
>> 
>> there was a general inability to agree on an existing operator as a
>> starting point and not much engagement on taking best of breed from the
>> various to combine them. Seems to leave us in the "let a thousand flowers
>> bloom" stage of letting operators grow in the ecosystem and seeing which
>> ones meet the needs of end users before talking about adopting one into the
>> foundation.
>> 
>> Great to hear that you folks are joining forces though! Bodes well for C*
>> users that are wanting to run things on k8s.
>> 
>> On Tue, Sep 22, 2020 at 4:26 AM, Ben Bromhead 
>> wrote:
>> 
>> For what it's worth, a quick update from me:
>> 
>> CassKop now has at least two organisations working on it substantially
>> (Orange and Instaclustr) as well as the numerous other contributors.
>> 
>> Internally we will also start pointing others towards CasKop once a few
>> things get merged. While we are not yet sunsetting our operator yet, it
>> 
>> is
>> 
>> certainly looking that way.
>> 
>> I'd love to see the community adopt it as a starting point for working
>> towards whatever level of functionality is desired.
>> 
>> Cheers
>> 
>> Ben
>> 
>> On Fri, Sep 11, 2020 at 2:37 PM John Sanda  wrote:
>> 
>> On Thu, Sep 10, 2020 at 5:27 PM Josh McKenzie 
>> wrote:
>> 
>> There's basically 1 java driver in the C* ecosystem. We have 3? 4? or
>> 
>> more
>> 
>> operators in the ecosystem. Has one of them hit a clear supermajority of
>> adoption that makes it the de facto default and makes sense to pull it
>> 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Joshua McKenzie
> I think there's significant value to the community in trying to coalesce
on a single approach,
I agree. Unfortunately in this case, the parties with a vested interest and
written operators came to the table and couldn't agree to coalesce on a
single approach. John Sanda attempted to start an initiative to write a
best-of-breed combining choice parts of each operator, but that effort did
not gain traction.

Which is where my hypothesis comes from that if there were a clear "better
fit" operator to start from we wouldn't be in a deadlock; the correct
choice would be obvious. Reasonably so, every engineer that's written
something is going to want that something to be used and not thrown away in
favor of another something without strong evidence as to why that's the
better choice.

As far as I know, nobody has made a clear case as to a more compelling
place to start in terms of an operator donation the project then
collaborates on. There's no mass adoption evidence nor feature enumeration
that I know of for any of the approaches anyone's taken, so the discussions
remain stalled.



On Wed, Sep 23, 2020 at 7:18 AM, Benedict Elliott Smith  wrote:

> I think there's significant value to the community in trying to coalesce
> on a single approach, earlier than later. This is an opportunity to expand
> the number of active organisations involved directly in the Apache
> Cassandra project, as well as to more quickly expand the project's
> functionality into an area we consider urgent and important. I think it
> would be a real shame to waste this opportunity. No doubt it will be hard,
> as organisations have certain built-in investments in their own approaches.
>
> I haven't participated in these calls as I do not consider myself to have
> the relevant experience and expertise, and have other focuses on the
> project. I just wanted to voice a vote in favour of trying to bring the
> different organisations together on a single approach if possible. Is there
> anything the project can do to help this happen?
>
> On 23/09/2020, 03:04, "Ben Bromhead"  wrote:
>
> I think there is certainly an appetite to donate and standardise on a
> given operator (as mentioned in this thread).
>
> I personally found the SIG hard to participate in due to time zones and
> the synchronous nature of it.
>
> So while it was a great forum to dive into certain details for a subset of
> participants and a worthwhile endeavour, I wouldn't paint it as an accurate
> reflection of community intent.
>
> I don't think that any participants want to continue down the path of "let
> a thousand flowers bloom". That's why we are looking towards CasKop (as
> well as a number of technical reasons).
>
> Some of the recorded meetings and outputs can also be found if you are
> interested in some primary sources
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> Cassandra+Kubernetes+Operator+SIG
> .
>
> From what I understand second-hand from talking to people on the SIG
> calls,
>
> there was a general inability to agree on an existing operator as a
> starting point and not much engagement on taking best of breed from the
> various to combine them. Seems to leave us in the "let a thousand flowers
> bloom" stage of letting operators grow in the ecosystem and seeing which
> ones meet the needs of end users before talking about adopting one into the
> foundation.
>
> Great to hear that you folks are joining forces though! Bodes well for C*
> users that are wanting to run things on k8s.
>
> On Tue, Sep 22, 2020 at 4:26 AM, Ben Bromhead 
> wrote:
>
> For what it's worth, a quick update from me:
>
> CassKop now has at least two organisations working on it substantially
> (Orange and Instaclustr) as well as the numerous other contributors.
>
> Internally we will also start pointing others towards CasKop once a few
> things get merged. While we are not yet sunsetting our operator yet, it
>
> is
>
> certainly looking that way.
>
> I'd love to see the community adopt it as a starting point for working
> towards whatever level of functionality is desired.
>
> Cheers
>
> Ben
>
> On Fri, Sep 11, 2020 at 2:37 PM John Sanda  wrote:
>
> On Thu, Sep 10, 2020 at 5:27 PM Josh McKenzie 
> wrote:
>
> There's basically 1 java driver in the C* ecosystem. We have 3? 4? or
>
> more
>
> operators in the ecosystem. Has one of them hit a clear supermajority of
> adoption that makes it the de facto default and makes sense to pull it
>
> into
>
> the project?
>
> We as a project community were pretty slow to move on building a PoV
>
> around
>
> kubernetes so we find ourselves in a situation with a bunch of contenders
> for inclusion in the project. It's not clear to me what heuristics we'd
>
> use
>
> to gauge which one would be the best fit for inclusion outside letting
> community adoption speak.
>
> ---
> Josh McKenzie
>
> We actually talked a good bit on the SIG call earlier today about
> heuristics. We need to document what functionality an operator should
> include at 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Benedict Elliott Smith
I think there's significant value to the community in trying to coalesce on a 
single approach, earlier than later.  This is an opportunity to expand the 
number of active organisations involved directly in the Apache Cassandra 
project, as well as to more quickly expand the project's functionality into an 
area we consider urgent and important.  I think it would be a real shame to 
waste this opportunity.  No doubt it will be hard, as organisations have 
certain built-in investments in their own approaches.

I haven't participated in these calls as I do not consider myself to have the 
relevant experience and expertise, and have other focuses on the project.  I 
just wanted to voice a vote in favour of trying to bring the different 
organisations together on a single approach if possible.  Is there anything the 
project can do to help this happen?
 

On 23/09/2020, 03:04, "Ben Bromhead"  wrote:

I think there is certainly an appetite to donate and standardise on a given
operator (as mentioned in this thread).

I personally found the SIG hard to participate in due to time zones and the
synchronous nature of it.

So while it was a great forum to dive into certain details for a subset of
participants and a worthwhile endeavour, I wouldn't paint it as an accurate
reflection of community intent.

I don't think that any participants want to continue down the path of  "let
a thousand flowers bloom". That's why we are looking towards CasKop (as
well as a number of technical reasons).

Some of the recorded meetings and outputs can also be found if you are
interested in some primary sources

https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG
.

From what I understand second-hand from talking to people on the SIG calls,
> there was a general inability to agree on an existing operator as a
> starting point and not much engagement on taking best of breed from the
> various to combine them. Seems to leave us in the "let a thousand flowers
> bloom" stage of letting operators grow in the ecosystem and seeing which
> ones meet the needs of end users before talking about adopting one into 
the
> foundation.
>
> Great to hear that you folks are joining forces though! Bodes well for C*
> users that are wanting to run things on k8s.
>
>
>
> On Tue, Sep 22, 2020 at 4:26 AM, Ben Bromhead  
wrote:
>
> > For what it's worth, a quick update from me:
> >
> > CassKop now has at least two organisations working on it substantially
> > (Orange and Instaclustr) as well as the numerous other contributors.
> >
> > Internally we will also start pointing others towards CasKop once a few
> > things get merged. While we are not yet sunsetting our operator yet, it
> is
> > certainly looking that way.
> >
> > I'd love to see the community adopt it as a starting point for working
> > towards whatever level of functionality is desired.
> >
> > Cheers
> >
> > Ben
> >
> > On Fri, Sep 11, 2020 at 2:37 PM John Sanda  wrote:
> >
> > On Thu, Sep 10, 2020 at 5:27 PM Josh McKenzie 
> > wrote:
> >
> > There's basically 1 java driver in the C* ecosystem. We have 3? 4? or
> >
> > more
> >
> > operators in the ecosystem. Has one of them hit a clear supermajority of
> > adoption that makes it the de facto default and makes sense to pull it
> >
> > into
> >
> > the project?
> >
> > We as a project community were pretty slow to move on building a PoV
> >
> > around
> >
> > kubernetes so we find ourselves in a situation with a bunch of 
contenders
> > for inclusion in the project. It's not clear to me what heuristics we'd
> >
> > use
> >
> > to gauge which one would be the best fit for inclusion outside letting
> > community adoption speak.
> >
> > ---
> > Josh McKenzie
> >
> > We actually talked a good bit on the SIG call earlier today about
> > heuristics. We need to document what functionality an operator should
> > include at level 0, level 1, etc. We did discuss this a good bit during
> > some of the initial SIG meetings, but I guess it wasn't really a focal
> > point at the time. I think we should also provide references to existing
> > operator projects and possibly other related projects. This would 
benefit
> > both community users as well as people working on these projects.
> >
> > - John
> >
> > --
> >
> > Ben Bromhead
> >
> > Instaclustr | www.instaclustr.com | @instaclustr
> >  | (650) 284 9692
> >
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
 | (650) 284 9692



-