Hi dev,

I'd like to draw your attention to an existing issue in our current read 
consistency level within the RatisConsensus module. As it stands, the default 
level is set to "query statemachine directly”, which, while latency-friendly, 
has led to user-reported bugs. Specifically, these bugs relate to the 
production of inconsistent results in subsequent SQL queries during a restart, 
creating a phantom read problem that may be confusing for our users.

To address this issue, I propose that we temporarily increase the read 
consistency level to linearizable read during restarts. This will ensure that 
we maintain data consistency during the critical recovery period. Once the 
cluster has successfully finished recovering from previous logs, we can then 
revert to the default consistency level.

You can find more details about this proposed solution in the linked pull 
request: https://github.com/apache/iotdb/pull/10597。

**Please note** that this change may affect module (including CQ, schema 
region, and data region) that calls RatisConsensus.read during the restart 
process. In such cases, a RatisUnderRecoveryException may be returned, 
indicating that RatisConsensus cannot serve read requests while it's replaying 
RaftLog. Therefore, we strongly encourage the affected modules to handle this 
situation appropriately, such as implementing a retry mechanism.

I look forward to hearing your thoughts on this proposal. Your feedback and 
suggestions will be appreciated.

Regards
William Song

Reply via email to