Hi Igniters,

I've been looking into various scenarios of Partition Loss Policies usage 
recently,
and found a number of issues in the current implementation.

I'll start with an overview, but if you'd like to dive to a proposal I have 
right now then please
feel free to scroll down to TLDR.

The list of issues is below:
https://issues.apache.org/jira/browse/IGNITE-10041: Partition loss policies 
work incorrectly with BLT
https://issues.apache.org/jira/browse/IGNITE-10043: Lost partitions list is 
reset when only one server is alive in the cluster
https://issues.apache.org/jira/browse/IGNITE-9841: SQL doesn't take lost 
partitions into account when persistence is enabled
https://issues.apache.org/jira/browse/IGNITE-10057: SQL queries hang during 
rebalance if there are LOST partitions
https://issues.apache.org/jira/browse/IGNITE-9902: ScanQuery doesn't take lost 
partitions into account
https://issues.apache.org/jira/browse/IGNITE-10059: Local scan query against 
LOST partition fails
https://issues.apache.org/jira/browse/IGNITE-10044: LOST partition is marked as 
OWNING after the owner rejoins with existing persistent data
https://issues.apache.org/jira/browse/IGNITE-10058: resetLostPartitions() 
leaves an additional copy of a partition in the cluster

I'm sure this is not a complete list, but this is what I could find by tackling 
how queries and 
persistence interact with current handling of partition loss.

It seems that the issues - from this list and some other fixed recently - can 
be split into three categories
- corner case bugs - there are always some, and we can fix them as they show up
- handling of lost partitions by different APIs - while JCache API handles lost 
partitions fine, 
SQL and Scan queries have known issues; other APIs, such different types of 
queries, DataStreamer,
etc probably need to have more testing
- Partition Loss Policices + BLT 
(https://issues.apache.org/jira/browse/IGNITE-10041) - BLT seems 
to be fundametally conflicting with the pre-existing semantics of partition loss

While the former two categories can be solved case-by-case, the last one needs 
a wider design effort.
We need to reimagine our partition loss semantics for BLT, and change behavior 
accordingly.
For now, most of the features don't really work for a cache with BLT, with only 
READ_WRITE_SAFE and
READ_ONLY_SAFE working correctly (good thing - these two are the most useful 
policies anyway).

TLDR: we have issues with partition loss policices, and the largest one is that 
BLT semantics
conflict with most partition lost policices, and we need to address this 
somehow.

What I suggest to do right now:
1. Deny the configurations that don't work - e.g. just throw an exception if a 
cache starts
with BLT and PartitionLossPolicy.IGNORE or others.
2. Change default PartitionLossPolicy to READ_WRITE_SAFE *for persistent caches 
only*.
This is what effectively in place for the persistent caches already (since 
IGNORE semantics are
not supported), so this shouldn't bring a lot of compatibility issues.

I believe doing this will at least help us to protect the users from 
unexpected/inconsistent behavior.
Actual design changes can be done later, e.g. as a part of IEP 4 Phase 2/3.

WDYT?

Thanks,
Stan

Reply via email to