[jira] [Comment Edited] (IGNITE-20124) Prevent double storage updates within primary

Denis Chudov (Jira) Tue, 29 Aug 2023 00:25:06 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759697#comment-17759697
 ]


Denis Chudov edited comment on IGNITE-20124 at 8/29/23 7:24 AM:
----------------------------------------------------------------

I made a test based on ItTxDistributedTestSingleNode (single node to make the 
updates on replication not necessary) which run multiple updates within one 
transaction (to exclude tx begin and tx commit calls) for 1 minute in 3 
variants: double updates, no updates on replication, updates of only indexes on 
replication.

Results were approximately the same (with difference in total tx count < 2% ).

So most likely this optimization will be postponed because the expected effect 
is not significant.

Code of the test:
{code:java}
@Test
public void testBench() throws InterruptedException, ExecutionException, 
TimeoutException {
    List<Future> futs = new ArrayList<>();

    int cores = Runtime.getRuntime().availableProcessors();

    ExecutorService executorService = Executors.newFixedThreadPool(cores);

    AtomicBoolean stopped = new AtomicBoolean();

    LongAdder adder = new LongAdder();

    // warm up
    Transaction tx = igniteTransactions.begin();

    for (int i = 0; i < cores; i++) {
        int finalI = i;
        futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
    }

    Thread.sleep(1000);

    stopped.set(true);

    executorService.shutdown();
    executorService.awaitTermination(10, TimeUnit.SECONDS);
    for (Future f : futs) {
        f.get(10, TimeUnit.SECONDS);
    }

    tx.commit();

    tx = igniteTransactions.begin();

    executorService = Executors.newFixedThreadPool(cores);

    stopped.set(false);
    futs.clear();
    adder.reset();

    // test
    System.out.println("benchmark started");

    for (int i = 0; i < cores; i++) {
        int finalI = i;
        futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
    }

    Thread.sleep(60_000);

    stopped.set(true);

    System.out.println("benchmark ended");

    executorService.shutdown();
    executorService.awaitTermination(10, TimeUnit.SECONDS);
    for (Future f : futs) {
        f.get(10, TimeUnit.SECONDS);
    }

    tx.commit();

    System.out.println("qqq tx count: " + adder.longValue());
}

private Runnable r(Transaction tx, AtomicBoolean stopped, int i, LongAdder 
adder) {
    return () -> {
        int j = 0;
        while (!stopped.get()) {
            accounts.recordView().upsert(tx, makeValue((long) i * 
Integer.MAX_VALUE + j, 0.0));
            j++;
            adder.add(1);
        }
    };
}{code}


was (Author: denis chudov):
I made a test based on ItTxDistributedTestSingleNode (single node to make the 
updates on replication not necessary) which run multiple updates within one 
transaction (to exclude tx begin and tx commit calls) for 1 minute in 3 
variants: double updates, no updates on replication, updates of oly indexes on 
replication.

Results were approximately the same (with difference in total tx count < 2% ).

So most likely this optimization will be postponed because the expected effect 
is not significant.

Code of the test:
{code:java}
@Test
public void testBench() throws InterruptedException, ExecutionException, 
TimeoutException {
    List<Future> futs = new ArrayList<>();

    int cores = Runtime.getRuntime().availableProcessors();

    ExecutorService executorService = Executors.newFixedThreadPool(cores);

    AtomicBoolean stopped = new AtomicBoolean();

    LongAdder adder = new LongAdder();

    // warm up
    Transaction tx = igniteTransactions.begin();

    for (int i = 0; i < cores; i++) {
        int finalI = i;
        futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
    }

    Thread.sleep(1000);

    stopped.set(true);

    executorService.shutdown();
    executorService.awaitTermination(10, TimeUnit.SECONDS);
    for (Future f : futs) {
        f.get(10, TimeUnit.SECONDS);
    }

    tx.commit();

    tx = igniteTransactions.begin();

    executorService = Executors.newFixedThreadPool(cores);

    stopped.set(false);
    futs.clear();
    adder.reset();

    // test
    System.out.println("benchmark started");

    for (int i = 0; i < cores; i++) {
        int finalI = i;
        futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
    }

    Thread.sleep(60_000);

    stopped.set(true);

    System.out.println("benchmark ended");

    executorService.shutdown();
    executorService.awaitTermination(10, TimeUnit.SECONDS);
    for (Future f : futs) {
        f.get(10, TimeUnit.SECONDS);
    }

    tx.commit();

    System.out.println("qqq tx count: " + adder.longValue());
}

private Runnable r(Transaction tx, AtomicBoolean stopped, int i, LongAdder 
adder) {
    return () -> {
        int j = 0;
        while (!stopped.get()) {
            accounts.recordView().upsert(tx, makeValue((long) i * 
Integer.MAX_VALUE + j, 0.0));
            j++;
            adder.add(1);
        }
    };
}{code}

> Prevent double storage updates within primary
> ---------------------------------------------
>
>                 Key: IGNITE-20124
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20124
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3, transactions
>
> h3. Motivation
> In order to preserve the guarantee that the primary replica is always 
> up-to-date it's required to:
>  * In case of common RW transaction - insert writeIntent to the storage 
> within primary before replication.
>  * In case of one-phase-commit - insert commitedWrite after the replication.
> Both have already been done. However, that means that if primary is part of 
> the replication group, and it's true in almost all cases, we will double the 
> update:
>  * In case of common RW transaction - through the replication.
>  * In case of one-phase-commit - either through the replication, or though 
> post update, if replication was fast enough.
> h3. Definition of Done
>  * Prevent double storage updates within primary.
> h3. Implementation Notes
> The easiest way to prevent double insert is to skip one if local safe time is 
> greater or equal to candidates. There are 3 places where we update partition 
> storage:
>  # Primary pre-replication update. In that case, the second update on 
> replication should be excluded.
>  # Primary post-replication update in case of 1PC. It's possible to see 
> already updated data if replication was already processed locally. It is 
> expected to be already covered in 
> https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the 
> primary safe time on post-replication update and don't do update if the safe 
> time is already adjusted.
>  # Insert through replication. In case of !1PC on every primary there will be 
> double insert (see 1). In case of 1PC it depends, so we should check the safe 
> time on primary to know whether the update should be done (see 2).
> In every case, the storage indexes still should be adjusted on replication, 
> as it is done now, because the progress of indexes on FSM write operations 
> should not be violated - otherwise, a Raft snapshot-based rebalance would be 
> broken. We may have two non-consistent storage updates on primary which may 
> affect different fsyncs, so maybe we should benchmark this optimization to 
> find out how useful it is. The transactional correctness isn't violated by 
> these non-consistent storage updates, because there is only a possibility 
> that some writes or write intents will go ahead of indexes and therefore will 
> be included into snapshots - however we still can process such writes and 
> resolve write intents.
> Also, the safe time needs to be updated on the primary replica now. There can 
> be following scenarios:
>  # Two-phase commit: we can advance safe time on primary, make 
> pre-replication update and then run Raft command. Both safe time adjustment 
> and storage update happen before replication.
>  # One-phase commit: safe time should be advanced after completeness of Raft 
> command future. There is no happens-before between the future callback and 
> the replication handler, so the safe time should be checked and advanced in 
> both places. We should use some critical section for the exact transaction, 
> preventing race between safe time check, safe time adjustment and storage 
> update.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-20124) Prevent double storage updates within primary

Reply via email to