[
https://issues.apache.org/jira/browse/CASSANDRA-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058104#comment-18058104
]
Arvind Kandpal commented on CASSANDRA-21152:
--------------------------------------------
Thanks for the review! these changes address two different layers of the
problem I found during investigation, but need to verify for final verification
as I try to write test file, but not reproduce build failure.
# Disabling MV Support: The DTest failure showed that CursorCompactor is
currently unsafe for Materialized Views (it lacks specific logic like
GarbageSkipper to handle view liveness). I disabled it to prevent data
inconsistency and crashes for MVs.
# Merging Logic Fix: While debugging the MV crash, I found a general bug in
CursorCompactor. It was failing to drop expired rows for Standard Tables as
well. I included this fix so that standard tables don't retain "zombie" expired
rows during compaction.
I try to test manually by adding debug logs to the code. Here is what I did:
Scenario A: Without Fix I instrumented the code to detect if an expired row was
being kept:
{code:java}
// Original Logic
if (rowActiveDeletion.deletes(mergedRowInfo) ||
purger.shouldPurge(mergedRowInfo, nowInSec)) {
mergedRowInfo = LivenessInfo.EMPTY;
}
else if (mergedRowInfo.isExpiring() && !mergedRowInfo.isLive(nowInSec)) {
logger.info("BUG DETECTED! Row is expired (TTL) but CursorCompactor is
keeping it alive!");
}
{code}
{code}
Result: The logs confirmed the bug: BUG DETECTED!
{code}
Scenario B: With Fix I updated the logic to explicitly check for expiration:
{code:java}
// Fixed Logic
if (rowActiveDeletion.deletes(mergedRowInfo) ||
(mergedRowInfo.isExpiring() && !mergedRowInfo.isLive(nowInSec)) || // <---
Added check
purger.shouldPurge(mergedRowInfo, nowInSec))
{
mergedRowInfo = LivenessInfo.EMPTY;
logger.info(" FIX WORKING! CursorCompactor detected expired row and DROPPED
it.");
}
{code}
{code}
Result: The logs confirmed the fix: FIX WORKING!
{code}
This confirms the logic fix is valid for standard tables, even though we are
disabling MVs.
I hope this helps explain the changes! Please look into this and suggest me
something if I do something wrong in it.
> Test failure: dtest.TestMaterializedViews.test_mv_with_default_ttl_with_flush
> ------------------------------------------------------------------------------
>
> Key: CASSANDRA-21152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21152
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.x
>
> Attachments: breaking_point.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> It fails consistently now, example:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/2391/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/Tests___dtest_jdk11_1_64___test_mv_with_default_ttl_with_flush/]
> {code:java}
> self = <materialized_views_test.TestMaterializedViews object at
> 0x7f808eb3b3d0>
> @since('3.0')
> def test_mv_with_default_ttl_with_flush(self):
> > self._test_mv_with_default_ttl(True)
> materialized_views_test.py:1333:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> materialized_views_test.py:1368: in _test_mv_with_default_ttl
> assert_none(session, "SELECT k,a,b FROM mv2")
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> session = <cassandra.cluster.Session object at 0x7f808dcc1d60>
> query = 'SELECT k,a,b FROM mv2', cl = None
> def assert_none(session, query, cl=None):
> """
> Assert query returns nothing
> @param session Session to use
> @param query Query to run
> @param cl Optional Consistency Level setting. Default ONE
>
> Examples:
> assert_none(self.session1, "SELECT * FROM test where key=2;")
> assert_none(cursor, "SELECT * FROM test WHERE k=2",
> cl=ConsistencyLevel.SERIAL)
> """
> simple_query = SimpleStatement(query, consistency_level=cl)
> res = session.execute(simple_query)
> list_res = _rows_to_list(res)
> > assert list_res == [], "Expected nothing from {}, but got
> > {}".format(query, list_res)
> E AssertionError: Expected nothing from SELECT k,a,b FROM mv2, but got
> [[1, 1, None]]
> tools/assertions.py:149: AssertionError
> {code}
> !breaking_point.png|width=700!
> it was broken between 2364 and 2367 Cassandra trunk runs.
> [https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/materialized_views_test/TestMaterializedViews/test_mv_with_default_ttl_with_flush]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]