[ https://issues.apache.org/jira/browse/CASSANDRA-12268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Yeksigian updated CASSANDRA-12268: --------------------------------------- Status: Patch Available (was: Awaiting Feedback) I've pushed an update - it was due to the initial collection of mutations being empty. I'm rerunning CI for 3.0 and trunk now. > Make MV Index creation robust for wide referent rows > ---------------------------------------------------- > > Key: CASSANDRA-12268 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12268 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Shook > Assignee: Carl Yeksigian > Fix For: 3.0.x, 3.x > > Attachments: 12268.py > > > When creating an index for a materialized view for extant data, heap pressure > is very dependent on the cardinality of of rows associated with each index > value. With the way that per-index value rows are created within the index, > this can cause unbounded heap pressure, which can cause OOM. This appears to > be a side-effect of how each index row is applied atomically as with batches. > The commit logs can accumulate enough during the process to prevent the node > from being restarted. Given that this occurs during global index creation, > this can happen on multiple nodes, making stable recovery of a node set > difficult, as co-replicas become unavailable to assist in back-filling data > from commitlogs. > While it is understandable that you want to avoid having relatively wide rows > even in materialized views, this represents a particularly difficult > scenario for triage. > The basic recommendation for improving this is to sub-group the index > creation into smaller chunks internally, providing a maximal bound against > the heap pressure when it is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)