[
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759697#comment-17759697
]
Denis Chudov edited comment on IGNITE-20124 at 8/29/23 7:24 AM:
----------------------------------------------------------------
I made a test based on ItTxDistributedTestSingleNode (single node to make the
updates on replication not necessary) which run multiple updates within one
transaction (to exclude tx begin and tx commit calls) for 1 minute in 3
variants: double updates, no updates on replication, updates of only indexes on
replication.
Results were approximately the same (with difference in total tx count < 2% ).
So most likely this optimization will be postponed because the expected effect
is not significant.
Code of the test:
{code:java}
@Test
public void testBench() throws InterruptedException, ExecutionException,
TimeoutException {
List<Future> futs = new ArrayList<>();
int cores = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(cores);
AtomicBoolean stopped = new AtomicBoolean();
LongAdder adder = new LongAdder();
// warm up
Transaction tx = igniteTransactions.begin();
for (int i = 0; i < cores; i++) {
int finalI = i;
futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
}
Thread.sleep(1000);
stopped.set(true);
executorService.shutdown();
executorService.awaitTermination(10, TimeUnit.SECONDS);
for (Future f : futs) {
f.get(10, TimeUnit.SECONDS);
}
tx.commit();
tx = igniteTransactions.begin();
executorService = Executors.newFixedThreadPool(cores);
stopped.set(false);
futs.clear();
adder.reset();
// test
System.out.println("benchmark started");
for (int i = 0; i < cores; i++) {
int finalI = i;
futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
}
Thread.sleep(60_000);
stopped.set(true);
System.out.println("benchmark ended");
executorService.shutdown();
executorService.awaitTermination(10, TimeUnit.SECONDS);
for (Future f : futs) {
f.get(10, TimeUnit.SECONDS);
}
tx.commit();
System.out.println("qqq tx count: " + adder.longValue());
}
private Runnable r(Transaction tx, AtomicBoolean stopped, int i, LongAdder
adder) {
return () -> {
int j = 0;
while (!stopped.get()) {
accounts.recordView().upsert(tx, makeValue((long) i *
Integer.MAX_VALUE + j, 0.0));
j++;
adder.add(1);
}
};
}{code}
was (Author: denis chudov):
I made a test based on ItTxDistributedTestSingleNode (single node to make the
updates on replication not necessary) which run multiple updates within one
transaction (to exclude tx begin and tx commit calls) for 1 minute in 3
variants: double updates, no updates on replication, updates of oly indexes on
replication.
Results were approximately the same (with difference in total tx count < 2% ).
So most likely this optimization will be postponed because the expected effect
is not significant.
Code of the test:
{code:java}
@Test
public void testBench() throws InterruptedException, ExecutionException,
TimeoutException {
List<Future> futs = new ArrayList<>();
int cores = Runtime.getRuntime().availableProcessors();
ExecutorService executorService = Executors.newFixedThreadPool(cores);
AtomicBoolean stopped = new AtomicBoolean();
LongAdder adder = new LongAdder();
// warm up
Transaction tx = igniteTransactions.begin();
for (int i = 0; i < cores; i++) {
int finalI = i;
futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
}
Thread.sleep(1000);
stopped.set(true);
executorService.shutdown();
executorService.awaitTermination(10, TimeUnit.SECONDS);
for (Future f : futs) {
f.get(10, TimeUnit.SECONDS);
}
tx.commit();
tx = igniteTransactions.begin();
executorService = Executors.newFixedThreadPool(cores);
stopped.set(false);
futs.clear();
adder.reset();
// test
System.out.println("benchmark started");
for (int i = 0; i < cores; i++) {
int finalI = i;
futs.add(executorService.submit(r(tx, stopped, finalI, adder)));
}
Thread.sleep(60_000);
stopped.set(true);
System.out.println("benchmark ended");
executorService.shutdown();
executorService.awaitTermination(10, TimeUnit.SECONDS);
for (Future f : futs) {
f.get(10, TimeUnit.SECONDS);
}
tx.commit();
System.out.println("qqq tx count: " + adder.longValue());
}
private Runnable r(Transaction tx, AtomicBoolean stopped, int i, LongAdder
adder) {
return () -> {
int j = 0;
while (!stopped.get()) {
accounts.recordView().upsert(tx, makeValue((long) i *
Integer.MAX_VALUE + j, 0.0));
j++;
adder.add(1);
}
};
}{code}
> Prevent double storage updates within primary
> ---------------------------------------------
>
> Key: IGNITE-20124
> URL: https://issues.apache.org/jira/browse/IGNITE-20124
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexander Lapin
> Priority: Major
> Labels: ignite-3, transactions
>
> h3. Motivation
> In order to preserve the guarantee that the primary replica is always
> up-to-date it's required to:
> * In case of common RW transaction - insert writeIntent to the storage
> within primary before replication.
> * In case of one-phase-commit - insert commitedWrite after the replication.
> Both have already been done. However, that means that if primary is part of
> the replication group, and it's true in almost all cases, we will double the
> update:
> * In case of common RW transaction - through the replication.
> * In case of one-phase-commit - either through the replication, or though
> post update, if replication was fast enough.
> h3. Definition of Done
> * Prevent double storage updates within primary.
> h3. Implementation Notes
> The easiest way to prevent double insert is to skip one if local safe time is
> greater or equal to candidates. There are 3 places where we update partition
> storage:
> # Primary pre-replication update. In that case, the second update on
> replication should be excluded.
> # Primary post-replication update in case of 1PC. It's possible to see
> already updated data if replication was already processed locally. It is
> expected to be already covered inĀ
> https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the
> primary safe time on post-replication update and don't do update if the safe
> time is already adjusted.
> # Insert through replication. In case of !1PC on every primary there will be
> double insert (see 1). In case of 1PC it depends, so we should check the safe
> time on primary to know whether the update should be done (see 2).
> In every case, the storage indexes still should be adjusted on replication,
> as it is done now, because the progress of indexes on FSM write operations
> should not be violated - otherwise, a Raft snapshot-based rebalance would be
> broken. We may have two non-consistent storage updates on primary which may
> affect different fsyncs, so maybe we should benchmark this optimization to
> find out how useful it is. The transactional correctness isn't violated by
> these non-consistent storage updates, because there is only a possibility
> that some writes or write intents will go ahead of indexes and therefore will
> be included into snapshots - however we still can process such writes and
> resolve write intents.
> Also, the safe time needs to be updated on the primary replica now. There can
> be following scenarios:
> # Two-phase commit: we can advance safe time on primary, make
> pre-replication update and then run Raft command. Both safe time adjustment
> and storage update happen before replication.
> # One-phase commit: safe time should be advanced after completeness of Raft
> command future. There is no happens-before between the future callback and
> the replication handler, so the safe time should be checked and advanced in
> both places. We should use some critical section for the exact transaction,
> preventing race between safe time check, safe time adjustment and storage
> update.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)