[jira] [Updated] (IGNITE-19043) ItRaftCommandLeftInLogUntilRestartTest: PageMemoryHashIndexStorage lacks data after cluster restart

Alexander Lapin (Jira) Wed, 29 Mar 2023 05:05:38 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-19043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexander Lapin updated IGNITE-19043:
-------------------------------------
    Description: 
After enabling ItRaftCommandLeftInLogUntilRestartTest failed with
{code:java}
org.opentest4j.AssertionFailedError: expected: not <null> {code}
while trying to retrieve previously added data after cluster restart. Seems 
that it's because there's no corresponding data in PK index.

It is worth to mention that originally given test is about about raft log 
re-application on node restart. So, I've commented all  
partitionUpdateInhibitor in order to check whether it's related to 
re-application or indexes themselves, problem is reproducible without 
re-application logic.

It might be related to rocks to page memory defaults migration. Further 
investigation required.
h3. Implementation notes

After the investigation it's occurred that the reason of the failure is that 
raft log re-appliance is skipped within PartitionListener#handleUpdateCommand 
and PartitionListener#handleUpdateAllCommand because of following logic

 
{code:java}
        TxMeta txMeta = txStateStorage.get(cmd.txId());
        if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() 
== ABORTED)) {
            storage.runConsistently(() -> {
                storage.lastApplied(commandIndex, commandTerm);
                return null;
            });
        } 
 
{code}

Full scenario is following:

1. tx1.put populates raft log and mvPartitionStorage with corresponding log 
record and data.

2. tx1.commit also populates raft log with raft record and finished the 
transaction within txnStateStorage along wiht cleanup in mvPartitionStorage.

3. RocksDB based txnStateStorage flushes its state to a disk and page memory 
based doesn't.

4. After node restart raft replays the log, both put and commit commands, 
however on commit partition we skip put re-application  because of 
aforementioned
{code:java}
if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() == 
ABORTED)){code}
Just in case, transaction is considered to be committed because txnStateStorage 
flushes its state before stop.

 

So, in order to fix given issue it's enough to just remove the skip logic.

  was:
After enabling ItRaftCommandLeftInLogUntilRestartTest failed with
{code:java}
org.opentest4j.AssertionFailedError: expected: not <null> {code}
while trying to retrieve previously added data after cluster restart. Seems 
that it's because there's no corresponding data in PK index.

It is worth to mention that originally given test is about about raft log 
re-application on node restart. So, I've commented all  
partitionUpdateInhibitor in order to check whether it's related to 
re-application or indexes themselves, problem is reproducible without 
re-application logic.

It might be related to rocks to page memory defaults migration. Further 
investigation required.
h3. Implementation notes

After the investigation it's occurred that the reason of the failure is that 
raft log re-appliance is skipped within PartitionListener#handleUpdateCommand 
and PartitionListener#handleUpdateAllCommand because of following logic
        TxMeta txMeta = txStateStorage.get(cmd.txId());
        if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() 
== ABORTED)) \{
            storage.runConsistently(() -> {
                storage.lastApplied(commandIndex, commandTerm);

                return null;
            });
        } 
Full scenario is following:

1. tx1.put populates raft log and mvPartitionStorage with corresponding log 
record and data.

2. tx1.commit also populates raft log with raft record and finished the 
transaction within txnStateStorage along wiht cleanup in mvPartitionStorage.

3. RocksDB based txnStateStorage flushes its state to a disk and page memory 
based doesn't.

4. After node restart raft replays the log, both put and commit commands, 
however on commit partition we skip put re-application  because of 
aforementioned
{code:java}
if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() == 
ABORTED)){code}
Just in case, transaction is considered to be committed because txnStateStorage 
flushes its state before stop.

 

So, in order to fix given issue it's enough to just remove the skip logic.


> ItRaftCommandLeftInLogUntilRestartTest: PageMemoryHashIndexStorage lacks data 
> after cluster restart
> ---------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-19043
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19043
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>
> After enabling ItRaftCommandLeftInLogUntilRestartTest failed with
> {code:java}
> org.opentest4j.AssertionFailedError: expected: not <null> {code}
> while trying to retrieve previously added data after cluster restart. Seems 
> that it's because there's no corresponding data in PK index.
> It is worth to mention that originally given test is about about raft log 
> re-application on node restart. So, I've commented all  
> partitionUpdateInhibitor in order to check whether it's related to 
> re-application or indexes themselves, problem is reproducible without 
> re-application logic.
> It might be related to rocks to page memory defaults migration. Further 
> investigation required.
> h3. Implementation notes
> After the investigation it's occurred that the reason of the failure is that 
> raft log re-appliance is skipped within PartitionListener#handleUpdateCommand 
> and PartitionListener#handleUpdateAllCommand because of following logic
>  
> {code:java}
>         TxMeta txMeta = txStateStorage.get(cmd.txId());
>         if (txMeta != null && (txMeta.txState() == COMMITED || 
> txMeta.txState() == ABORTED)) {
>             storage.runConsistently(() -> {
>                 storage.lastApplied(commandIndex, commandTerm);
>                 return null;
>             });
>         } 
>  
> {code}
> Full scenario is following:
> 1. tx1.put populates raft log and mvPartitionStorage with corresponding log 
> record and data.
> 2. tx1.commit also populates raft log with raft record and finished the 
> transaction within txnStateStorage along wiht cleanup in mvPartitionStorage.
> 3. RocksDB based txnStateStorage flushes its state to a disk and page memory 
> based doesn't.
> 4. After node restart raft replays the log, both put and commit commands, 
> however on commit partition we skip put re-application  because of 
> aforementioned
> {code:java}
> if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() == 
> ABORTED)){code}
> Just in case, transaction is considered to be committed because 
> txnStateStorage flushes its state before stop.
>  
> So, in order to fix given issue it's enough to just remove the skip logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-19043) ItRaftCommandLeftInLogUntilRestartTest: PageMemoryHashIndexStorage lacks data after cluster restart

Reply via email to