[ 
https://issues.apache.org/jira/browse/IGNITE-26119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maksim Davydov reassigned IGNITE-26119:
---------------------------------------

    Assignee: Maksim Davydov

> Create a set of tests to examine tx protocol behavior against an unstable 
> network
> ---------------------------------------------------------------------------------
>
>                 Key: IGNITE-26119
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26119
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Sergey Chugunov
>            Assignee: Maksim Davydov
>            Priority: Major
>              Labels: ise
>
> Ignite 2.x TX protocol is based on Two-Phase Commit (2PC) algorithm which is 
> known to be unstable in an environment with unstable network. Lost 
> messages/timeouts/network splits aka split-brain situations could lead to 
> data loss or data inconsistency.
> At the same time there are no tests to verify Ignite TX protocol in a 
> controllable environment.
> The task is to create a set of such tests and find improvements to the 
> protocol, logging and tooling to make it easier to track and fix problematic 
> transactions.
> A good example of such a scenario looks like this:
> # Cluster of 5 nodes, cache with 2 backups.
> # A transaction covering two partitions is started.
> # Finish message is sent to a backup node of one partition and a primary node 
> of another partition.
> # Other nodes don't receive this commit message as tx coordinator along with 
> the nodes from previous step become unavailable.
> The task here is to assess what happens on the other three nodes that have 
> never seen finish request, how they would recover the transaction. Is it 
> possible to get a data inconsistency between different nodes (e.g. the other 
> three nodes make a decision to rollback the tx). If yes, is it possible to 
> prevent this by plugging in a TopologyValidator?
> Options to expand this scenario include:
> # Assingning different nodes a role of tx coordinator.
> # Using different transaction concurrency and isolation levels.
> # Setting up different timeouts for tx rollback, network etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to