[ 
https://issues.apache.org/jira/browse/CASSANDRA-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058104#comment-18058104
 ] 

Arvind Kandpal commented on CASSANDRA-21152:
--------------------------------------------

Thanks for the review!  these changes address two different layers of the 
problem I found during investigation, but need to verify for final verification 
as I try to write test file, but not reproduce build failure.

# Disabling MV Support: The DTest failure showed that CursorCompactor is 
currently unsafe for Materialized Views (it lacks specific logic like 
GarbageSkipper to handle view liveness). I disabled it to prevent data 
inconsistency and crashes for MVs.
# Merging Logic Fix: While debugging the MV crash, I found a general bug in 
CursorCompactor. It was failing to drop expired rows for Standard Tables as 
well. I included this fix so that standard tables don't retain "zombie" expired 
rows during compaction.

I try to test manually by adding debug logs to the code. Here is what I did:

Scenario A: Without Fix I instrumented the code to detect if an expired row was 
being kept:


{code:java}
// Original Logic
if (rowActiveDeletion.deletes(mergedRowInfo) || 
purger.shouldPurge(mergedRowInfo, nowInSec)) {
    mergedRowInfo = LivenessInfo.EMPTY;
}
else if (mergedRowInfo.isExpiring() && !mergedRowInfo.isLive(nowInSec)) {
    logger.info("BUG DETECTED! Row is expired (TTL) but CursorCompactor is 
keeping it alive!");
}
{code}

{code}
Result: The logs confirmed the bug: BUG DETECTED!
{code}


Scenario B: With Fix I updated the logic to explicitly check for expiration:


{code:java}
// Fixed Logic
if (rowActiveDeletion.deletes(mergedRowInfo) || 
   (mergedRowInfo.isExpiring() && !mergedRowInfo.isLive(nowInSec)) || // <--- 
Added check
   purger.shouldPurge(mergedRowInfo, nowInSec)) 
{
    mergedRowInfo = LivenessInfo.EMPTY;
    logger.info(" FIX WORKING! CursorCompactor detected expired row and DROPPED 
it.");
}
{code}

{code}
Result: The logs confirmed the fix: FIX WORKING!
{code}

This confirms the logic fix is valid for standard tables, even though we are 
disabling MVs.

I hope this helps explain the changes! Please look into this and suggest me 
something if I do something wrong in it.

> Test failure:  dtest.TestMaterializedViews.test_mv_with_default_ttl_with_flush
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21152
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21152
>             Project: Apache Cassandra
>          Issue Type: Bug
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: breaking_point.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> It fails consistently now, example: 
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/2391/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/Tests___dtest_jdk11_1_64___test_mv_with_default_ttl_with_flush/]
> {code:java}
> self = <materialized_views_test.TestMaterializedViews object at 
> 0x7f808eb3b3d0>
>     @since('3.0')
>     def test_mv_with_default_ttl_with_flush(self):
> >       self._test_mv_with_default_ttl(True)
> materialized_views_test.py:1333: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> materialized_views_test.py:1368: in _test_mv_with_default_ttl
>     assert_none(session, "SELECT k,a,b FROM mv2")
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> session = <cassandra.cluster.Session object at 0x7f808dcc1d60>
> query = 'SELECT k,a,b FROM mv2', cl = None
>     def assert_none(session, query, cl=None):
>         """
>         Assert query returns nothing
>         @param session Session to use
>         @param query Query to run
>         @param cl Optional Consistency Level setting. Default ONE
>     
>         Examples:
>         assert_none(self.session1, "SELECT * FROM test where key=2;")
>         assert_none(cursor, "SELECT * FROM test WHERE k=2", 
> cl=ConsistencyLevel.SERIAL)
>         """
>         simple_query = SimpleStatement(query, consistency_level=cl)
>         res = session.execute(simple_query)
>         list_res = _rows_to_list(res)
> >       assert list_res == [], "Expected nothing from {}, but got 
> > {}".format(query, list_res)
> E       AssertionError: Expected nothing from SELECT k,a,b FROM mv2, but got 
> [[1, 1, None]]
> tools/assertions.py:149: AssertionError
> {code}
> !breaking_point.png|width=700!
> it was broken between 2364 and 2367 Cassandra trunk runs.
> [https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/materialized_views_test/TestMaterializedViews/test_mv_with_default_ttl_with_flush]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to