[jira] [Updated] (IGNITE-13358) Improvements for partition clearing related parts

Alexey Scherbakov (Jira) Fri, 14 Aug 2020 00:58:54 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Scherbakov updated IGNITE-13358:
---------------------------------------
    Description: 
We have several issues related to a partition clearing worth fixing.

1. PartitionsEvictManager doent's provide obvious guarantees for a correctness 
when a node or a cache group is stopped while partitions are concurrently 
clearing.

2. GridDhtLocalPartition#awaitDestroy is called while holding topology write 
lock, which is deadlock prone, because we currently require write lock to 
destroy a partition.

3. GridDhtLocalPartition contains a lot of messy code related to partition 
clearing, most notably ClearFuture, but the clearing is done by 
PartitionsEvictManager. We should get rid of a clearing code in 
GridDhtLocalPartition. This should also bring better code readility and help 
understand what happening during a clearing.

4. Currently moving partitions are cleared before rebalancing in the order 
different to rebalanceOrder, breaking the contract.

5. The clearing logic for for moving partitions (before rebalancing) seems 
incorrect: it's possible to lost updates received during clearing.

6. To clear partitions before full rebalancing we utilize same threads as for a 
partition eviction. This can slow rebalancing even if we have resources. Better 
to clear partitions in the rebalance pool (explicitely dedicated by user).

7. It's possible to reserve a renting partition, which have absolutely no 
meaning. All operations with a renting partitions (except clearing) are a waste 
of resources.

8. Partition eviction causes system pool starvation if a number of thread in 
system pool=1. This can break crucial functionality.

  was:
We have several issues related to a partition clearing worth fixing.

1. PartitionsEvictManager doent's provide obvious guarantees for a correctness 
when a node or a cache group is stopped while partitions are concurrently 
clearing.

2. GridDhtLocalPartition#awaitDestroy is called while holding topology write 
lock, which is deadlock prone, because we currently require write lock to 
destroy a partition.

3. GridDhtLocalPartition contains a lot of messy code related to partition 
clearing, most notably ClearFuture, but the clearing is done by 
PartitionsEvictManager. We should get rid of a clearing code in 
GridDhtLocalPartition. This should also bring better code readility and help 
understand what happening during a clearing.

4. Currently moving partitions are cleared before rebalancing in the order 
different to rebalanceOrder, breaking the contract.

5. The clearing logic for for moving partitions (before rebalancing) seems 
incorrect: it's possible to lost updates received during clearing.

6. To clear partitions before full rebalancing we utilize same threads as for a 
partition eviction. This can slow rebalancing even if we have resources. Better 
to clear partitions in the rebalance pool (explicitely dedicated by user).

7. It's possible to reserve a renting partition, which have absolutely no 
meaning. All operations with a renting partitions (except clearing) are a waste 
of resources.

8. Partition eviction causes system pool starvation if a number of thread in 
system pool is < 8. This can break crucial functionality.


> Improvements for partition clearing related parts
> -------------------------------------------------
>
>                 Key: IGNITE-13358
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13358
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexey Scherbakov
>            Assignee: Alexey Scherbakov
>            Priority: Major
>
> We have several issues related to a partition clearing worth fixing.
> 1. PartitionsEvictManager doent's provide obvious guarantees for a 
> correctness when a node or a cache group is stopped while partitions are 
> concurrently clearing.
> 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write 
> lock, which is deadlock prone, because we currently require write lock to 
> destroy a partition.
> 3. GridDhtLocalPartition contains a lot of messy code related to partition 
> clearing, most notably ClearFuture, but the clearing is done by 
> PartitionsEvictManager. We should get rid of a clearing code in 
> GridDhtLocalPartition. This should also bring better code readility and help 
> understand what happening during a clearing.
> 4. Currently moving partitions are cleared before rebalancing in the order 
> different to rebalanceOrder, breaking the contract.
> 5. The clearing logic for for moving partitions (before rebalancing) seems 
> incorrect: it's possible to lost updates received during clearing.
> 6. To clear partitions before full rebalancing we utilize same threads as for 
> a partition eviction. This can slow rebalancing even if we have resources. 
> Better to clear partitions in the rebalance pool (explicitely dedicated by 
> user).
> 7. It's possible to reserve a renting partition, which have absolutely no 
> meaning. All operations with a renting partitions (except clearing) are a 
> waste of resources.
> 8. Partition eviction causes system pool starvation if a number of thread in 
> system pool=1. This can break crucial functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (IGNITE-13358) Improvements for partition clearing related parts

Reply via email to