Hi Igniters, I've been looking into various scenarios of Partition Loss Policies usage recently, and found a number of issues in the current implementation.
I'll start with an overview, but if you'd like to dive to a proposal I have right now then please feel free to scroll down to TLDR. The list of issues is below: https://issues.apache.org/jira/browse/IGNITE-10041: Partition loss policies work incorrectly with BLT https://issues.apache.org/jira/browse/IGNITE-10043: Lost partitions list is reset when only one server is alive in the cluster https://issues.apache.org/jira/browse/IGNITE-9841: SQL doesn't take lost partitions into account when persistence is enabled https://issues.apache.org/jira/browse/IGNITE-10057: SQL queries hang during rebalance if there are LOST partitions https://issues.apache.org/jira/browse/IGNITE-9902: ScanQuery doesn't take lost partitions into account https://issues.apache.org/jira/browse/IGNITE-10059: Local scan query against LOST partition fails https://issues.apache.org/jira/browse/IGNITE-10044: LOST partition is marked as OWNING after the owner rejoins with existing persistent data https://issues.apache.org/jira/browse/IGNITE-10058: resetLostPartitions() leaves an additional copy of a partition in the cluster I'm sure this is not a complete list, but this is what I could find by tackling how queries and persistence interact with current handling of partition loss. It seems that the issues - from this list and some other fixed recently - can be split into three categories - corner case bugs - there are always some, and we can fix them as they show up - handling of lost partitions by different APIs - while JCache API handles lost partitions fine, SQL and Scan queries have known issues; other APIs, such different types of queries, DataStreamer, etc probably need to have more testing - Partition Loss Policices + BLT (https://issues.apache.org/jira/browse/IGNITE-10041) - BLT seems to be fundametally conflicting with the pre-existing semantics of partition loss While the former two categories can be solved case-by-case, the last one needs a wider design effort. We need to reimagine our partition loss semantics for BLT, and change behavior accordingly. For now, most of the features don't really work for a cache with BLT, with only READ_WRITE_SAFE and READ_ONLY_SAFE working correctly (good thing - these two are the most useful policies anyway). TLDR: we have issues with partition loss policices, and the largest one is that BLT semantics conflict with most partition lost policices, and we need to address this somehow. What I suggest to do right now: 1. Deny the configurations that don't work - e.g. just throw an exception if a cache starts with BLT and PartitionLossPolicy.IGNORE or others. 2. Change default PartitionLossPolicy to READ_WRITE_SAFE *for persistent caches only*. This is what effectively in place for the persistent caches already (since IGNORE semantics are not supported), so this shouldn't bring a lot of compatibility issues. I believe doing this will at least help us to protect the users from unexpected/inconsistent behavior. Actual design changes can be done later, e.g. as a part of IEP 4 Phase 2/3. WDYT? Thanks, Stan