Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Jon Haddad Mon, 09 Dec 2024 12:25:50 -0800

>
> Totally agree, but even with the excellent tooling you've worked on were
> it universally adopted and Harry + Simulator, we're still not exercising
> hundreds of nodes with petabytes of data heavily using new features in
> surprising ways; I argue that this would be necessary in order to certify
> something as production ready without a "released to the wild in beta"
> phase.



You're right, but testing on 1K nodes is also on the extreme end of
things.  Most of the issues I run into I can spot with a 3 node cluster in
under an hour.  Here's a few examples of things that were completely
preventable had *any* testing been done at all:

* I found CASSANDRA-19477 on a 3 node cluster doing a basic write workload,
profiling a node when 1 was down.  The overhead here from hints could have
caused outages in a multi-dc environment if the link between the DCs was
severed.  This was a potentially massive flaw, checking the filesystem and
on every hint and doing a needless regex.

* MVs - I helped a team move off MVs, they had 1GB of data on fairly decent
sized nodes, and it was causing outages.  Repair doesn't work at all, was
it tested outside of a laptop?

* IR - this goes WAY back, but it was tragically bad.  Made the default,
running it you had a non trivial chance of flooding your node with
thousands of tiny sstables.  At TLP we had to rescue multiple clusters put
in this state.

* Vector search - how did nobody notice that you couldn't search a few
hundred MB of data?  Again, where was the testing?  CASSANDRA-18715 is
where it was merged.  Zero evidence to show it's ready for production.

Related Slack convo:
https://the-asf.slack.com/archives/CK23JSY2K/p1731531536695989

* When we were going to release 5.0 RC1, I brought up an OOM I found that
affected off heap memtables, and some people actually wanted to push ahead
with the release even though I had documented an issue that causes the DB
to crash.  This wasn't even a testing problem.

@Jeremy: Issues like the last 2 make it difficult to just trust the word of
DataStax, because you talk up Vector search so much yet what we have in OSS
isn't useful outside of small demos.  And to be honest, we should *never*
just trust people, we need at least *some* data.  And in the case of the
RC, you guys wanted to release 5.0 knowing it would crash on our users.

I hope I've made my point.  The bar for merging in new functionality should
be higher.  Features should work with 1TB of data on 3 nodes, that's a low
bar.  I've spent at least a thousand hours over the last 5 years developing
the tooling to do these tests, there's no reason to not do them, and when
we know things are broken, we shouldn't ship them.

Now, some praise!!

* I think the work done on UCS was handled very differently than Vector,
for example.  We got Vector v1, essentially a non functioning version, and
Astra got v3.  UCS was worked on for years in Astra first.  With UCS, DS
used Astra as a guinea pig.  I've spent about 100 hours testing it, have
moved production clients to it and it's solid.  Yeah I had to isolate
myself in a room with a printed copy of the paper, a pen, and no other
distractions to understand it, but that's better than it just not working.
We can fix the docs, fixing fundamental architectural flaws is a lot harder.

* The work being done on Accord is far more in the open, every time I see a
JIRA related to it, I am happy.  Huge kudos to everyone working on it.

* I worked with Caleb to find potential issues with SAI.  Guardrails were
put in place to prevent the problematic queries from being run.  It
discourages use cases that don't scale well - full cluster queries.  I was
really happy to work together on this and hopefully prevent some really sad
users.

If our codebase was more loosely coupled I think this would be great. As it
> stands, it's very hard to add things that you plan on later ripping out
> (i.e. experimental that might not last). Definitely like the sentiment on
> the way we view / commit to a beta feature.


It would be incredible to have an incubator JAR that could be dropped in
and specific features enabled, so we can get them tested out without
shipping them and an obligation to maintain.  Zero backwards compatibility,
works only with latest releases, and we aggressively cut things that don't
work out.  I quiver with joy at the mere idea.

Jon


On Mon, Dec 9, 2024 at 11:33 AM Josh McKenzie <jmcken...@apache.org> wrote:

> However I don't think "beta" is much better.  I don't think it should be
> the path for all new features - I think it should be a case-by-case basis.
> Going to the next major/minor version in and of itself obviously doesn't
> make the feature more stable or more production ready.  We would need a
> plan to go from beta to GA presumably with "work out X deficiency before
> calling it GA."
>
> Agree - features should go this path on a case-by-case basis (scale,
> impact, complexity, etc). I disagree on the "delay for a major not helping"
> - it's not *causally* going to make anything better in isolation, but it
> is codifying the current reality that when something is first merged in
> it's almost never ready for production use despite our best efforts at
> validation. Of note: this intersects unfavorably with our trouble on
> getting releases out in a timely fashion unless we moved to being able to
> remove the beta flag on a minor if we all agreed something had stabilized.
>
> This points to the bind we often find ourselves in with big features. To
> have enough confidence to consider something GA ready it has to be "used in
> anger" at some non-trivial scale we just haven't found ourselves able to do
> in a CI environment pre-release, much less pre-merge. Harry and the
> Simulator are both *major* steps in the direction of better validation,
> but right now I don't believe they tackle the combinatorial surface area of
> cluster size, use-cases, and query patterns the way we need to to have
> confidence that something is ready for widespread adoption.
>
> We really shouldn't be put in a position where stuff gets released, hyped
> up, then we find it it's obviously not ready for real world use.
>
> Totally agree, but even with the excellent tooling you've worked on were
> it universally adopted and Harry + Simulator, we're still not exercising
> hundreds of nodes with petabytes of data heavily using new features in
> surprising ways; I argue that this would be necessary in order to certify
> something as production ready without a "released to the wild in beta"
> phase.
>
> This reminds me of when Jason Brown was working on an in-jvm Gossip
> simulator that'd spin up thousands of instances of Gossip to communicate
> with each other to simulate large-scale Gossip operations. It's cost
> prohibitive to test at that scale when our unit is "a node" with thousands
> of real-world nodes and petabytes of data, so that leaves us with the proxy
> signal of "We unit, integration, and fuzzed this thing. Now we need people
> to use it in real scenarios to find out what's *really* wrong with it.".
> This leads to a Very Bad Experience for our users and a rightful reputation
> of "Using a new feature right when it first drops is a bad idea in prod;
> test it out in QA or in a new app / cluster and suss out the bugs."
>
> At least to me, the "beta" flag is a great signal of our commitment to
> something + it's API stability and a call to action for the broader
> community to beat on something to provide feedback, at least given the
> limitations we have today with pre-release validation.
>
> Anything we have marked experimental that's not being actively developed
> (with no plans to develop in the future, i.e. MV's) should probably be
> removed from the codebase.
>
> Maybe there is an argument for "experimental"=this is here to get feedback
> but there's no commitment it will make it to production ready and "beta"=we
> think this is done but we'd like to see some production use before
> declaring it stable. For beta, we'll treat bugs with the same priority as
> "stable" (or at least close to)?
>
> If our codebase was more loosely coupled I think this would be great. As
> it stands, it's very hard to add things that you plan on later ripping out
> (i.e. experimental that might not last). Definitely like the sentiment on
> the way we view / commit to a beta feature.
>
> On Mon, Dec 9, 2024, at 1:56 PM, Ekaterina Dimitrova wrote:
>
> Hey Jon,
> The following quick test shows me that vector search is marked as
> experimental (it is just not in cassandra.yaml as materialized views, etc)
>
> cqlsh:k> CREATE TABLE t (pk int, str_val text, val vector<float, 3>,
> PRIMARY KEY(pk));
>
> cqlsh:k> CREATE CUSTOM INDEX ON t(val) USING 'StorageAttachedIndex';
>
>
> Warnings :
>
>
> SAI ANN indexes on vector columns are experimental and are not recommended
> for production use.
>
> They don't yet support SELECT queries with:
>
>  * Consistency level higher than ONE/LOCAL_ONE.
>
>  * Paging.
>
>  * No LIMIT clauses.
>
>  * PER PARTITION LIMIT clauses.
>
>  * GROUP BY clauses.
>
>  * Aggregation functions.
>
>  * Filters on columns without a SAI index.
>
>
> I do agree that there is differentiation also between experimental and
> beta. But I need to think more before expressing concrete
> opinion/suggestions here. Though I believe this conversation is healthy to
> have and shows the maturity of our project. Thank you, Josh!
>
>
> Best regards,
>
> Ekaterina
>
>
> On Mon, 9 Dec 2024 at 13:21, Jon Haddad <j...@rustyrazorblade.com> wrote:
>
> The tough thing here is that MVs are marked experimental retroactively,
> because by the time the problems were known, there wasn't much anyone could
> do.  Experimental was our way of saying "oops, we screwed up, let's put a
> label on it" and the same label got applied to a bunch of new stuff
> including Java 17.  They're not even close to being in the same category,
> but they're labeled the same and people treat them as equivalent.
>
> If we knew MVs were so broken before they were merged, they would have
> been -1'ed.  Same with incremental repair (till 4.0), and vector search
> today.  I would have -1'ed all three of these if it was known how poorly
> they actually performed at the time they were committed.
>
> Side note, vector search isn't marked as experimental today, but it's not
> even usable for non-trivial datasets out of the box, so it should be marked
> as such at this point.
>
> I really wish this stuff was tested at a reasonable scale across various
> failure modes before merging, because the harm it does to the community is
> real.  We really shouldn't be put in a position where stuff gets released,
> hyped up, then we find it it's obviously not ready for real world use.  I
> built my tooling (tlp-cluster, now easy-cass-lab, and tlp-stress, now
> easy-cass-stress), with this in mind, but sadly I haven't seen much use of
> it it to verify patches.  The only reason I found a memory leak in
> CASSANDRA-15452 was because I used these tools on multi-TB datasets over
> several days.
>
>
> Jon
>
>
> On Mon, Dec 9, 2024 at 9:55 AM Slater, Ben via dev <
> dev@cassandra.apache.org> wrote:
>
> I'm a little worried by the idea of grouping in MVs with things like a
> Java version under the same "beta" label (acknowledging that they are
> currently grouped under the same "experimental" label).
>
> To me, "beta" implies it's pretty close to production ready and there is
> an intention to get it to production ready in the near future. I don't
> think this really describes MVs as I don't see anyone looking like they are
> trying to get them to really production ready (although I could easily be
> wrong on that).
>
> Maybe there is an argument for "experimental"=this is here to get feedback
> but there's no commitment it will make it to production ready and "beta"=we
> think this is done but we'd like to see some production use before
> declaring it stable. For beta, we'll treat bugs with the same priority as
> "stable" (or at least close to)?
>
> Cheers
> Ben
>
>
>
>
> ------------------------------
>
> *From:* Jon Haddad <j...@rustyrazorblade.com>
> *Sent:* 09 December 2024 09:43
> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org>
> *Subject:* Re: [DISCUSS] Experimental flagging (fork from Re-evaluate
> compaction defaults in 5.1/trunk)
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments*
>
>
>
> I like this.  There's a few things marked as experimental today, so I'll
> take a stab at making this more concrete, and I think we should be open to
> graduating certain things out of beta to GA at a faster cycle than a major
> release.
>
> Java versions, for example, should really move out of "beta" quickly.  We
> test against it, and we're not going to drop new versions.  So if we're
> looking at C* 5.0, we should move Java 17 out of experimental / beta
> immediately and call it GA.
>
> SAI and UCS should probably graduate no later than 5.1.
>
> On the other hand, MVs have enough warts I actively recommend against
> using them and should be in beta till we can actually repair them.
>
> I don't know if anyone's actually used transient replication and if it's
> even beta quality... that might actually warrant being called experimental
> still.
>
> 'ALTER ... DROP COMPACT STORAGE' is flagged as experimental.  I'm not sure
> what to do with this.  I advise people migrate their data for any Thrift ->
> CQL cases, mostly because the edge cases are so hard to know in advance,
> especially since by now these codebases are ancient and the original
> developers are long gone.
>
> Thoughts?
>
> Jon
>
>
>
>
> On Mon, Dec 9, 2024 at 6:28 AM Josh McKenzie <jmcken...@apache.org> wrote:
>
>
> Jon stated:
>
> Side note: I think experimental has been over-used and has lost all
> meaning.  How is Java 17 experimental?  Very confusing for the community.
>
>
> Dinesh followed with:
>
> Philosophically, as a project, we should wait until critical features like
> these reach a certain level of maturity prior to recommending it as a
> default. For me maturity is a function of adoption by diverse use-cases in
> production and scale.
>
>
> I'd like to discuss 2 ideas related to the above:
>
>    1. We rename / alias "experimental" to "beta". It's a word that's
>    ubiquitous in our field and communicates the correct level of expectation
>    to our users (API stable, may have bugs)
>    2. *All new features* go through one major (either semver MAJOR or
>    MINOR) as "beta"
>
>
> To Jon's point, "experimental" was really a kludge to work around
> Materialized Views having some very sharp edges that users had to be very
> aware of. We haven't really used the flagging much (at all?) since then,
> and we don't have a formalized way to shepherd a new feature through a
> "soak" period where it can "reach a certain level of maturity". We're
> caught in a chicken-or-egg scenario with our current need to get a feature
> released more broadly to have confidence in its stability (to Dinesh's
> point).
>
> In my mind, the following feature evolution would be healthy for us and
> good for our users:
>
>    1. Beta
>    2. Generally Available
>    3. Default (where appropriate)
>
> To graduate from Beta -> GA, good UX, user facing documentation, a
> [DISCUSS] thread where we have a clear consensus of readiness, all seem
> like healthy and good steps. From GA -> Default, [DISCUSS] like we're
> having re: compaction strategies, unearthing shortcomings, edge-cases,
> documentation needs, etc.
>
> Curious what others think.
>
> ~Josh
>
>
>

Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

Reply via email to