[
https://issues.apache.org/jira/browse/CASSANDRA-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623987#comment-17623987
]
Alex Petrov edited comment on CASSANDRA-17240 at 10/25/22 6:58 PM:
-------------------------------------------------------------------
{quote}I don't think that we should default to consider experimental everything
that hasn't been tested with Harry, nor consider production-ready everything
that has been tested with it
{quote}
[~adelapena] I agree about the second part (ie not considering something
production ready if it has been tested with Harry), but former I tend to
disagree with. I realise there might be no desire to adopt Harry by either you
personally or by members of your team, but Harry has proven itself to find bugs
other testing approaches do not, and it models Cassandra behaviour in a way
that can reasonably only be matched by handcrafting a very large number of test
cases, which is of course impossible. Unless you can provide an example of an
integration testing tool that can match Harry in an ability to validate a
feature, I'll be happy to check it out and compare approaches.
If I'm not mistaken, there is no publicly available tooling that was used to
validate _correctness_ of Cassandra as a whole with trie memtables enabled. I
realize there might be some internal tooling you might have used for
verification, but unless this tooling is available for others to build
confidence and reproduce results, I'm afraid this is not enough, either.
Lastly, we had a discussion with [~blambov] during ApacheCon, and he has also
agreed that we should not make this feature GA until we've exhaustively tested
it with Harry.
{quote}seen some real usage
{quote}
I do not think that "some" real usage is a good measure by any means. We're
testing small, large instances, multitude of schemas, short- and long running
tests. Database should work for every use-case that can be thought of, not for
some specific use-case, and tooling around verification should strive to cover
all of those cases, especially for cases as crucial as this one.
I think with 4.0 release, we've been able to prove that fuzz testing is the
only way to build confidence, since smaller and targeted use-cases often
exercise only a fraction of our code.
bq. trying to force developers to use Harry if they want to move their features
out of the experimental status
I do not see why not. If additions to Paxos aren't going to be tested with a
simulator, I don't think anyone would feel safe to use them. Similarly, we do
have a strong preference (even though not a general rule) for in-jvm dtests
over python dtests. All of these tools didn't exist until very recently.
I would not go as far as "forcing" anyone to use Harry. I think it's best to
let people make wise decisions based on the data available. But unless there's
at least you can demonstrate that at least an equivalent rigour was applied to
testing, someone will just have to run Harry tests. And I think both Caleb and
myself have been not only advocating for, but also actively offering help for
testing both SAI and trie-based indexes with Harry.
was (Author: ifesdjeen):
bq. I don't think that we should default to consider experimental everything
that hasn't been tested with Harry, nor consider production-ready everything
that has been tested with it
[~adelapena] I agree about the second part (ie not considering something
production ready if it has been tested with Harry), but former I tend to
disagree with. I realise there might be no desire to adopt Harry by either you
personally or by members of your team, but Harry has proven itself to find bugs
other testing approaches do not, and it models Cassandra behaviour in a way
that can reasonably only be matched by handcrafting a very large number of test
cases, which is of course impossible. Unless you can provide an example of an
integration testing tool that can match Harry in an ability to validate a
feature, I'll be happy to check it out and compare approaches.
If I'm not mistaken, there is no publicly available tooling that was used to
validate _correctness_ of Cassandra as a whole with trie memtables enabled. I
realize there might be some internal tooling you might have used for
verification, but unless this tooling is available for others to build
confidence and reproduce results, I'm afraid this is not enough, either.
Lastly, we had a discussion with [~blambov] during ApacheCon, and he has also
agreed that we should not make this feature GA until we've exhaustively tested
it with Harry.
bq. seen some real usage
I do not think that "some" real usage is a good measure by any means. We're
testing small, large instances, multitude of schemas, short- and long running
tests. Database should work for every use-case that can be thought of, not for
some specific use-case, and tooling around verification should strive to cover
all of those cases, especially for cases as crucial as this one.
I think with 4.0 release, we've been able to prove that fuzz testing is the
only way to build confidence, since smaller and targeted use-cases often
exercise only a fraction of our code.
> CEP-19: Trie memtable implementation
> ------------------------------------
>
> Key: CASSANDRA-17240
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17240
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Branimir Lambov
> Assignee: Branimir Lambov
> Priority: Normal
> Fix For: 4.2
>
> Attachments: SkipListMemtable-OSS.png, TrieMemtable-OSS.png,
> density_SG.html.gz, density_test_with_sharding.html.gz, latency-1_1-95.png,
> latency-9_1-95.png, throughput_SG.png, throughput_apache.png
>
> Time Spent: 13.5h
> Remaining Estimate: 0h
>
> Trie-based memtable implementation as described in CEP-19, built on top of
> CASSANDRA-17034 and CASSANDRA-6936.
> The implementation is available in this
> [branch|https://github.com/blambov/cassandra/tree/CASSANDRA-17240].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]