On Sat, Nov 26, 2016 at 8:32 PM Olivier El Mekki <oliv...@el-mekki.com>
wrote:

>
> This could yield interesting properties, because this means that even
> using channels, maybe we could write concurrent code and still have a
> chance to have predictable and reproducible execution in testing
> environment.
>
>
>
I think it is a bit more complex:

Having a deterministic execution of serial & sequential nature is good for
establishing a base-point of what a given routine is going to do. The key
point of a concurrent/parallel environment is that there should be no
discrepancy between this base-point and any possible execution order of
concurrent goroutines.

In short, you want to randomize the scheduler to run highly unlikely
schedules. If any of these fails to produce a correct result, you want to
know what order the scheduler executed the goroutines in. You also want the
system to automatically degrade the concurrent schedule into a sequential
one, so you can find the exact interleaving which is the culprit of the
problem.

The reason this is possible has to do with linearizability. An external
party, observing the sequential system will see a trace of result-events
from the system in the test. A correctness criterion which is necessary but
not sufficient is that the concurrent execution of the same program should
produce a valid linearizable trace. That is, given a trace of observations,
we must ask "is there any of the sequential executions which can give such
a trace?". If yes, then the trace is valid. If no, we can report an error.

It isn't as far-fetched as one might think. For example, Kyle Kingsbury are
using these ideas in his Jepsen tool, for testing the linearizability (and
other) properties of distributed databases. In his setup, you record a
trace of events from a distributed database. The database runs virtually in
VMs and the network is forcefully cut via firewall rules. Afterwards, a
tool, Knossos, tries to find errors in the traces by looking for invalid
observations. It proves to be a very efficient way to test the
PR-documentation of distributed databases and verify the claims of
guarantees the databases have. Usually, vendors are trying to hand-wave
themselves through impossibility results from distributed computing (such
as the CAP theorem).

Another example is PULSE & Concuerror from the Erlang world. A program is
compiled with instrumentation. This instrumentation inserts a control-point
before each possible blocking request, so a separate scheduler can handle
the actual interleaving of the program. Now, the properties of concurrent
execution can be studied by forcing schedules in a certain order. The
crucial trick in these systems is a function

func split(seed int) (seedL int, seedR int)
which can split the current PRNG seed into two seeds, left and right. It is
needed to simplify a schedule once a failing schedule is found. Imagine you
were able to draw the sequence-diagram: you want it to be as small as
possible since that is easier to understand.

Whenever you start up a possible subcomputation, you split the seed. Now,
if you want to throw away the subcomputation, you throw away either seedL
or seedR. This keeps the PRNG intact for the other half. If you had a
standard linear seed (rather than a tree which the above essentially is)
the problem is that you have to skip a lot of random values in the PRNG for
the subcomputations you don't use. This allows you to ignore
subcomputations which are not relevant to the current error, throw them
away and continue as if they never happened. In particular, you can throw
away operations from the trace by excising them out—simplifying things in
the process.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to