[
https://issues.apache.org/jira/browse/CASSANDRA-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adam Holmberg updated CASSANDRA-15348:
--------------------------------------
Complexity: Challenging
> Harry: generator library and extensible framework for fuzz testing Apache
> Cassandra
> -----------------------------------------------------------------------------------
>
> Key: CASSANDRA-15348
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15348
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Alex Petrov
> Assignee: Alex Petrov
> Priority: Normal
> Fix For: 4.0-beta
>
>
> h2. Description:
> This ticket introduces Harry, a component for fuzz testing and verification
> of the Apache Cassandra clusters at scale.
> h2. Motivation:
> Current testing tooling largely tests for common- and edge-cases, and most of
> the tests use predefined datasets. Property-based tests can help explore a
> broader range of states, but often require either a complex model or a large
> state to test against.
> h2. What problems Harry solves:
> Harry allows to run tests that are able to validate state of both dense nodes
> (to test local read-write path) and large clusters (to test distributed
> read-write path), and do it efficiently. Main goals, and what sets it apart
> from the other testing tools is:
> * The state required for verification should remain as compact as possible.
> * The verification process itself should be as performant as possible.
> * Ideally, we'd want a way to verify database state while _continuing_
> running state change queries against it.
> h2. What Harry does:
> To achieve this, Harry defines a model that holds the state of the database,
> generators that produce reproducible, pseudo-random schemas, mutations, and
> queries, and a validator that asserts the correctness of the model following
> execution of generated traffic.
> h2. Harry consists of multiple reusable components:
> * Generator library: how to create a library of invertible, order-preserving
> generators for simple and composite data types.
> * Model and checker: how to use the properties of generators to validate the
> output of an eventually-consistent database in a linear time.
> * Runner library: how to create a scheme for reproducible runs, despite the
> concurrent nature of database and fuzzer itself.
> h2. Short and somewhat approximate description of how Harry achieves this:
> Generation and validation define strict mathematical relations between the
> generated values and pseudorandom numbers they were generated from. Using
> these properties, we can store minimal state and check if these properties
> hold during validation.
> Since Cassandra stores data in rows, we should be able to "inflate" data to
> insert a row into the database from a single number we call _descriptor_.
> Each value in the row read from the database can be "deflated" back to the
> descriptor it was generated from. This way, to precisely verify the state of
> the row, we only need to know the descriptor it was generated from and a
> timestamp at which it was inserted.
> Similarly, keys for the inserted row can be "inflated" from a single 64-bit
> integer, and then "deflated" back to it. To efficiently search for keys,
> while allowing range scans, our generation scheme preserves the order of the
> original 64-bit integer. Every pair of keys generated from two 64-bit
> integers would sort the same way as these integers.
> This way, in order to validate a state of the range of rows queried from the
> database, it is sufficient to "deflate" its key and data values, use deflated
> 64-bit key representation to find all descriptors these rows were generated
> from, and ensure that the given sequence of descriptors could have resulted
> in the state that database has responded with.
> Using this scheme, we keep a minimum possible amount of data per row, can
> efficiently generate the data, and backtrack values to the numbers they were
> generated from. Most of the time, we operate on 64-bit integer values and
> only use "inflated" objects when running queries against database state,
> minimizing the amount of required memory.
> h2. Name:
> Harry (verb).
> According to Marriam-Webster:
> * to torment by or as if by constant attack
> * persistently carry out attacks on (an enemy or an enemy's territory)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]