[
https://issues.apache.org/jira/browse/CASSANDRA-15368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971743#comment-16971743
]
Benedict Elliott Smith commented on CASSANDRA-15368:
----------------------------------------------------
Hi [~dimitarndimitrov],
I think you have it the wrong wrong way around; in your parlance, we need:
* oldMemtable.accepts(<HW>) returns false
* oldMemtable.accepts(<LW>) returns false
* newMemtable.accepts(<HW>) returns true
* newMemtable.accepts(<LW>) returns true
If you look at the new documentation introduced in CASSANDRA-15367
[here|https://github.com/belliottsmith/cassandra/commit/ed6adf5eabe62f8ce6a1341e0c5423ba53036197#diff-f0a15c3588b56c5ce53ece7c48e325b5R109],
you'll see that there is a region at the start of all memtables where some
records from the prior {{group}}, that may have arbitrarily delayed obtaining
their {{ReplayPosition}}, are intermixed with those of the later group. This
region is essentially owned by both memtables, but only the later memtable
invalidates the relevant commit log records. The problem occurs if the earlier
flush fails (and we do not terminate the process), _or_ if the process
terminates with the later flush having completed (since we will use the
start/end {{ReplayPosition}} associated with the sstable to invalidate the
commit log in the same way).
> Failing to flush Memtable without terminating process results in permanent
> data loss
> ------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15368
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15368
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Commit Log, Local/Memtable
> Reporter: Benedict Elliott Smith
> Priority: Normal
> Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> {{Memtable}} do not contain records that cover a precise contiguous range of
> {{ReplayPosition}}, since there are only weak ordering constraints when
> rolling over to a new {{Memtable}} - the last operations for the old
> {{Memtable}} may obtain their {{ReplayPosition}} after the first operations
> for the new {{Memtable}}.
> Unfortunately, we treat the {{Memtable}} range as contiguous, and invalidate
> the entire range on flush. Ordinarily we only invalidate records when all
> prior {{Memtable}} have also successfully flushed. However, in the event of
> a flush that does not terminate the process (either because of disk failure
> policy, or because it is a software error), the later flush is able to
> invalidate the region of the commit log that includes records that should
> have been flushed in the prior {{Memtable}}
> More problematically, this can also occur on restart without any associated
> flush failure, as we use commit log boundaries written to our flushed
> sstables to filter {{ReplayPosition}} on recovery, which is meant to
> replicate our {{Memtable}} flush behaviour above. However, we do not know
> that earlier flushes have completed, and they may complete successfully
> out-of-order. So any flush that completes before the process terminates, but
> began after another flush that _doesn’t_ complete before the process
> terminates, has the potential to cause permanent data loss.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]