Re: [DISCUSS] CEP-10: Cluster and Code Simulations

Sam Tunnicliffe Tue, 13 Jul 2021 01:37:42 -0700

Spoiler alert: I am pretty familiar with the proposal and the off-list work 
that has been done toward it.


From my perspective, I have no qualms about putting this CEP up for a vote. 
Having seen the potential (and to some degree, realised) benefit of this 
proposal, I am
convinced of its value.

Thanks,
Sam

> On 13 Jul 2021, at 09:20, bened...@apache.org wrote:
> 
> Did anyone have any thoughts on this CEP, or shall I bring it forward for a 
> vote also?
> 
> From: bened...@apache.org <bened...@apache.org>
> Date: Thursday, 3 June 2021 at 20:19
> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> Subject: [DISCUSS] CEP-10: Cluster and Code Simulations
> Proposal for a mechanism to evaluate whole clusters, or individual classes, 
> with a deterministically pseudorandom ordering of all thread and message 
> events.
> 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-10%3A+Cluster+and+Code+Simulations
> 
> Evaluating the correctness of distributed systems is hard, as I’m sure every 
> developer on this list appreciates. As the project has matured, we have had 
> to grapple more with the guarantees we provide users for features we develop, 
> and the semantics we promise, particularly around edge-cases between two 
> mechanisms or systems.
> 
> This work aims to dramatically reduce the project overhead necessary for 
> delivering a bug-free Cassandra.
> 
> The premise is to intercept all relevant events that could be performed in a 
> different order, i.e. primarily message delivery and thread events such as 
> executor submission, signalling of threads, lock acquisition and release, and 
> even volatile reads and writes (to a lesser extent). These events are then 
> scheduled pseudo-randomly (with various restrictions to ensure a valid 
> execution), or in some cases not evaluated at all (to simulate e.g. messages 
> being lost). The result is a repeatable sequential evaluation of a 
> multi-threaded, multi-actor system.
> 
> This permits us to evaluate a much broader range of cluster behaviours 
> without any additional development work, permitting us to implement a broad 
> range of property-based and related randomized acceptance tests, without 
> significant developer burden.
> 
> The work will apply just as readily to multi-threaded single classes as it 
> will to whole clusters, and will come with a linearizability test for LWTs as 
> well as a unit test for an existing multi-threaded bug that is otherwise hard 
> to exhibit.
> 
> To achieve this, significant modifications will be required to the codebase, 
> mostly cleaning up existing abstractions. Specifically, we will need to be 
> able to mock executors, any blocking concurrency primitives, time, filesystem 
> access and internode streaming.
> 
> The work is – in large part – already complete, with JIRA and PRs to follow 
> in the coming weeks. Of course, the work is subject to the usual community 
> input and review, so this does not preclude changes to the work (even 
> significant ones, if they are warranted). I know a lot of incoming CEP are 
> likely to be backed up by significant off-list development as a result of the 
> focus on a shippable 4.0. Hopefully this is just a temporary growing pain, 
> particularly as we move towards a shippable trunk.
> 
> I hope this work will be of huge value to the project, particularly as we 
> race to catch up on years of limited feature development.
> 
> JIRA and PRs will follow, but I wanted to kick-off discussion in advance.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] CEP-10: Cluster and Code Simulations

Reply via email to