Good afternoon,

Am looking into some latent issues post 3.11.x to 4.0.x upgrade. The system
is using materialized views and the core problem is related to how the
mutations are being sent from the parent table to two related materialized
views.

In 3.11.x, without any tuning ( no flag set for
*-Dcassandra.mv_enable_coordinator_batchlog * or changes to the concurrent
mv writers, etc. ) , the cluster behaved fine if a node went down. After
the upgrade, there are tons of CL LOCAL_ONE issues as it relates to
acquiring a lock on *every other node that was up*, and eventually *on
every other node the *CPU, network, and memory get saturated until the node
is brought back up.

I've compared the cassandra.yaml, jvm.options etc and don't see anything
especially different.

I see two major code paths that use the ViewManager.*updatesAffectsView* to
determine next steps are in Keyspace.java and in StorageProxy.java. The
StorageProxy code is not being hit from my review.

I've tracked the code down to:
 (
https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Keyspace.java#L528
)

And :
https://github.com/apache/cassandra/blob/43ec1843918aba9e81d3c2dc1433a1ef4740a51f/src/java/org/apache/cassandra/db/view/ViewManager.java#L71

```
if (!enableCoordinatorBatchlog && coordinatorBatchlog)
            return false;
```


We tried setting the mv coordinator flag to true and it made no difference,
which shouldn't be the case. If a value isn't set it should be defaulting
to false as per Java's Boolean.getBoolean method. *May try setting it to
false and seeing the behavior. *

The strategic recommendation I've made is to move away from MVs to
self-managed views and then eventually make use of SAI if it works. I'm
still curious why it would behave so drastically differently in 4.0.x than
3.11.x.

Has anyone else seen something like this? Am also going to try to recreate
this in a vanilla environment and will report back.

rahul.xavier.si...@gmail.com

http://cassandra.link

Reply via email to