[
https://issues.apache.org/jira/browse/CASSANDRA-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh McKenzie reassigned CASSANDRA-15536:
-----------------------------------------
Assignee: (was: Josh McKenzie)
> 4.0 Quality: Components and Test Plans
> --------------------------------------
>
> Key: CASSANDRA-15536
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15536
> Project: Cassandra
> Issue Type: Epic
> Components: Test/benchmark, Test/dtest/python, Test/fuzz, Test/unit
> Reporter: Josh McKenzie
> Priority: High
> Fix For: 4.0.x
>
>
> [Source doc from
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#].
> Jira migrated from
> [cwiki|https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality:+Components+and+Test+Plans]
> The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a
> state where major users would run it in production when it is cut. To gain
> this confidence there are various ongoing testing efforts involving
> correctness, performance, and ease of use. In this page we try to coordinate
> and identify blockers for subsystems before we can release 4.0
> For each component we strive to have shepherds and contributors involved.
> Shepherds should be committers or knowledgeable component owners and are
> responsible for driving their blocking tickets to completion and ensuring
> quality in their claimed area, while contributors have signed up to help
> verify that subsystem by running tests or contributing fixes. Shepherds also
> ideally help set testing standards and ensure that we meet a high standard of
> quality in their claimed area.
> If you are interested in contributing to testing 4.0, please add your name as
> assignee if you want to drive things, reviewer if just participate and
> review, and get involved in the the tracking ticket, and dev list/IRC
> discussions involving that component.
> h3. Targeted Components / Subsystems
> We've tried to collect some of the major components or subsystems that we
> want to ensure work properly towards having a great 4.0 release. If you think
> something is missing please add it. Better yet volunteer to contribute to
> testing it!
> h4. Internode Messaging
> In 4.0 we're getting a new Netty based inter-node communication system
> (CASSANDRA-8457). As internode messaging is vital to the correctness and
> performance of the database we should make sure that all forms (TLS,
> compressed, low latency, high latency, etc ...) of internode messaging
> function correctly.
> h4. Test Infrastructure / Automation: Diff Testing
> Diff testing is a form of model-based testing in which two clusters are
> exhaustively compared to assert identity. To support Apache Cassandra 4.0
> validation, contributors have developed cassandra-diff. This is a Spark
> application that distributes the token range over a configurable number of
> Spark executors, then parallelizes randomized forward and reverse reads with
> varying paging sizes to read and compare every row present in the cluster,
> persisting a record of mismatches for investigation. This methodology has
> been instrumental to identifying data loss, data corruption, and incorrect
> response issues introduced in early Cassandra 3.0 releases.
> cassandra-diff and associated documentation can be found at:
> [https://github.com/apache/cassandra-diff]. Contributors are encouraged to
> run diff tests against clusters they manage and report issues to ensure
> workload diversity across the project.
> h4. System Tables and Internal Schema
> This task covers a review of and minor bug fixes to local and distributed
> system keyspaces. Planned work in this area is now complete.
> h4. Source Audit and Performance Testing: Streaming
> This task covers an audit of the Streaming implementation in Apache Cassandra
> 4.0. In this release, contributors have implemented full-SSTable streaming to
> improve performance and reduce memory pressure. Internode messaging changes
> implemented in CASSANDRA-15066 adjacent to streaming suggested that review of
> the streaming implementation itself may be desirable. Prior work also covered
> performance testing of full-SSTable streaming.
> h4. Test Infrastructure / Automation: "Harry"
> CASSANDRA-15348 - Harry: generator library and extensible framework for fuzz
> testing Apache Cassandra TRIAGE NEEDED
> Harry is a component for fuzz testing and verification of the Apache
> Cassandra clusters at scale. Harry allows to run tests that are able to
> validate state of both dense nodes (to test local read-write path) and large
> clusters (to test distributed read-write path), and do it efficiently. Harry
> defines a model that holds the state of the database, generators that produce
> reproducible, pseudo-random schemas, mutations, and queries, and a validator
> that asserts the correctness of the model following execution of generated
> traffic. See CASSANDRA-15348 for additional details.
> h4. Local Read/Write Path: IndexInfo (CASSANDRA-11206)
> Users upgrading from Cassandra 3.0.x to trunk will pick up CASSANDRA-11206 in
> the process. Contributors to 4.0 testing and validation have allocated time
> to testing and validation of these changes via source audit and
> implementation of property-based tests (currently underway). The majority of
> planned work here is complete, with a final set of perf tests in progress. No
> correctness issues were identified via the source audit and randomized
> testing. Minor cleanup and refactoring may follow, but these changes are
> expected to be small in scope, if any.
> h4. Local Read/Write Path: Upgrade and Diff Test
> Execution of upgrade and diff tests via cassandra-diff have proven to be one
> of the most effective approaches toward identifying issues with the local
> read/write path. These include instances of data loss, data corruption, data
> resurrection, incorrect responses to queries, incomplete responses, and
> others. Upgrade and diff tests can be executed concurrent with fault
> injection (such as host or network failure); as well as during mixed-version
> scenarios (such as upgrading half of the instances in a cluster, and running
> upgradesstables on only half of the upgraded instances).
> Upgrade and diff tests are expected to continue through the release cycle,
> and are a great way for contributors to gain confidence in the correctness of
> the database under their own workloads.
> h4. Local Read/Write Path: Other Areas
> Testing in this area refers to the local read/write path (StorageProxy,
> ColumnFamilyStore, Memtable, SSTable reading/writing, etc). We are still
> finding numerous bugs and issues with the 3.0 storage engine rewrite
> (CASSANDRA-8099). For 4.0 we want to ensure that we thoroughly cover the
> local read/write path with techniques such as property-based testing, fuzzing
> (example), and a source audit.
> Distributed Read/Write Path: Coordination, Replication, and Read Repair
> Testing in this area focuses on non-node-local aspects of the read-write
> path: coordination, replication, read repair, etc.
> h4. Repair
> We aim for 4.0 to have the first fully functioning incremental repair
> solution (CASSANDRA-9143)! Furthermore we aim to verify that all types of
> repair: (full range, sub range, incremental) function as expected as well as
> ensuring community tools such as Reaper work. CASSANDRA-3200 adds an
> experimental option to reduce the amount of data streamed during repair, we
> should write more tests and see how it works with big nodes.
> h4. Compaction
> Alongside the local and distributed read/write paths, we'll also want to
> validate compaction. CASSANDRA-6696 introduced substantial
> changes/improvements that require testing (esp. JBOD).
> h4. Metrics
> In past releases we've unknowingly broken metrics integrations and introduced
> performance regressions in metrics collection and reporting. We strive in 4.0
> to not do that. Metrics should work well!
> h4. Tooling: Bundled / First-Party
> Test plans should cover bundled first-party tooling and CLIs such as
> nodetool, cqlsh, and new tools supporting full query and audit logging
> (CASSANDRA-13983, CASSANDRA-12151).
> h4. Tooling: External Ecosystem
> Many users of Apache Cassandra employ open source tooling to automate
> Cassandra configuration, runtime management, and repair scheduling. Prior to
> release, we need to confirm that popular third-party tools such as Reaper,
> Priam, etc. function properly.
> h4. Test Frameworks, Tooling, Infrastructure / Automation
> This area refers to contributions to test frameworks/tooling (e.g., dtests,
> QuickTheories, CASSANDRA-14821), and automation enabling those tools to be
> applied at scale (e.g., replay testing via Spark-based replay of captured FQL
> logs).
> h4. Cluster Setup and Maintenance
> We want 4.0 to be easy for users to setup out of the box and just work. This
> means having low friction when users download the Cassandra package and start
> running it. For example, users should be able to easily configure and start
> new 4.0 clusters and have tokens distributed evenly. Another example is
> packaging, it should be easy to install Cassandra on all supported platforms
> (e.g. packaging) and have Cassandra use standard platform integrations.
> h4. Platforms / Runtimes
> CASSANDRA-9608 introduces support for Java 11. We'll want to verify that
> Cassandra under Java 11 meets expectations of stability.
> h4. Cluster Upgrade
> We've historically had numerous bugs concerning upgrading clusters from one
> version to the other. Let's establish the supported upgrade path and ensure
> that users can safely perform the upgrades in production.
> h4. Documentation
> Many sections of our documentation are incomplete or wrong. Let's deliver a
> functional but also well documented 4.0 release.
> h4. Features / Substantial Changes
> Transient Replication
> Transient Replication is an experimental implementation of witness replicas
> included in Apache Cassandra 4.0 (CASSANDRA-14697). As this feature is
> experimental, the focus of testing and validation in this release will be
> toward ensuring that its implementation doesn't negatively impact
> non-transient use cases.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]