[jira] [Commented] (CASSANDRA-12783) Break up large MV mutations to prevent OOMs
[ https://issues.apache.org/jira/browse/CASSANDRA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256404#comment-16256404 ] Kurt Greaves commented on CASSANDRA-12783: -- So an update here. Seeing as this is going into 4 it's only getting a limited amount of my attention at the moment so I thought I'd put up my most recent version here and where it's up to (so I don't forget, and if anyone has any insights). |[trunk|https://github.com/apache/cassandra/compare/trunk...kgreav:12783-trunk]|[dtest|https://github.com/apache/cassandra-dtest/compare/master...kgreav:12783]| So at the moment it more or less works however the dtest fails occasionally due to the corrupt value length, even though it shouldn't happen. You can even keep the test dir, go run the query again, and get the correct results/no corrupt values. There's still a few TODO's, and needs some performance testing for comparison + to see if it's worth chunking mutations in {{store}}. LegacyBatchlogMigrator needs a few more fixes to support the new version (it's a copy of the old 3.0 version). Also I think ideally the {{BatchlogManager.markBatchesActive()}} should probably use the batchlog to mark the batches active so that we can guarantee if they are marked they will eventually all be marked, but at the moment that's not possible because the batchlog uses a GCGS of 0. It's probably safe to set a GCGS higher than 0, it will just end with us reading some tombstones when we want to replay the batchlog - but really it's not a great solution. Special casing local batchlog writes is also another option (as there's no point consulting GCGS for a local batchlog write) however we'd need to also store the table the batch was destined for as part of the batchlog - which seems a bit over the top just for the sake of MV's. I also considered using an unlogged batch, because for a local write it should be enough, however I couldn't figure out any nice and clean way to issue an UNLOGGED batch, and according to CASSANDRA-9283 a LOGGED batch should essentially be unlogged if it's for a single partition, but I haven't yet found these "optimisations" that guarantee this and even still using a logged batch doesn't work because GCGS=0. > Break up large MV mutations to prevent OOMs > --- > > Key: CASSANDRA-12783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12783 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views >Reporter: Carl Yeksigian >Assignee: Kurt Greaves > Fix For: 4.x > > > We only use the code path added in CASSANDRA-12268 for the view builder > because otherwise we would break the contract of the batchlog, where some > mutations may be written and pushed out before the whole batch log has been > saved. > We would need to ensure that all of the updates make it to the batchlog > before allowing the batchlog manager to try to replay them, but also before > we start pushing out updates to the paired replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12783) Break up large MV mutations to prevent OOMs
[ https://issues.apache.org/jira/browse/CASSANDRA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152090#comment-16152090 ] ZhaoYang commented on CASSANDRA-12783: -- bq. separateUpdate It's for view rebuild. bq. in the view write path to ensure normal updates only generate a single mutation to avoid the case of splitting a MV update over multiple batches (I think?) For normal base update, existing method is to generate 1 collection of view mutation and put it into batchlog to ensure eventual consistency that once base mutation is applied, view mutation should eventually be applied. With the new "active batches method", we could achieve the same eventual consistency semantic and reduce memory footprint, eg. 2&3 bq. With the whole "active" batch method could we remove the separateUpdates case and instead split it over multiple batches, however only set them to be active once they are all written, and fail the write if any batch store errors? We still have the problem in #2, all modified base data are held in memory. If it's partition deletion, entire base partition is held in memory. Perhaps we could make {{generateViewUpdates}} entirely based on iterator and split large view mutations into small groups. In the case of view rebuild, each element from the iterator is marked as active and can be shipped to view immediately. In the case of normal base update, each element from the iterator is marked as not active, till all are stored. > Break up large MV mutations to prevent OOMs > --- > > Key: CASSANDRA-12783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12783 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views >Reporter: Carl Yeksigian >Assignee: Kurt Greaves > Fix For: 4.x > > > We only use the code path added in CASSANDRA-12268 for the view builder > because otherwise we would break the contract of the batchlog, where some > mutations may be written and pushed out before the whole batch log has been > saved. > We would need to ensure that all of the updates make it to the batchlog > before allowing the batchlog manager to try to replay them, but also before > we start pushing out updates to the paired replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12783) Break up large MV mutations to prevent OOMs
[ https://issues.apache.org/jira/browse/CASSANDRA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152068#comment-16152068 ] Kurt Greaves commented on CASSANDRA-12783: -- Thanks for pointing that out for the UUIDs. Issue seems fixed now. Regarding 2 & 3, I see that [~carlyeks] implemented [{{separateUpdates}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/view/TableViews.java#L265] in the view write path to ensure normal updates only generate a single mutation to avoid the case of splitting a MV update over multiple batches (I think?). With the whole "active" batch method could we remove the {{separateUpdates}} case and instead split it over multiple batches, however only set them to be active once they are all written, and fail the write if any batch {{store}} errors? Not super familiar with that code so not really sure if it covers all the potential MV cases, but if it'd work I think it would be much cleaner than breaking apart batches during store. > Break up large MV mutations to prevent OOMs > --- > > Key: CASSANDRA-12783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12783 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views >Reporter: Carl Yeksigian >Assignee: Kurt Greaves > Fix For: 4.x > > > We only use the code path added in CASSANDRA-12268 for the view builder > because otherwise we would break the contract of the batchlog, where some > mutations may be written and pushed out before the whole batch log has been > saved. > We would need to ensure that all of the updates make it to the batchlog > before allowing the batchlog manager to try to replay them, but also before > we start pushing out updates to the paired replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12783) Break up large MV mutations to prevent OOMs
[ https://issues.apache.org/jira/browse/CASSANDRA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16144131#comment-16144131 ] ZhaoYang commented on CASSANDRA-12783: -- The default TimeUUID(version 1).compareTo() doesn't compare Timestamp first then machine part, it compares MSB. You could try {{TimeUUIDType.compare}} {code} UUID V1, Most Significant Bit: 0x time_low 0x time_mid 0xF000 version 0x0FFF time_hi {code} > Break up large MV mutations to prevent OOMs > --- > > Key: CASSANDRA-12783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12783 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views >Reporter: Carl Yeksigian >Assignee: Kurt Greaves > Fix For: 4.x > > > We only use the code path added in CASSANDRA-12268 for the view builder > because otherwise we would break the contract of the batchlog, where some > mutations may be written and pushed out before the whole batch log has been > saved. > We would need to ensure that all of the updates make it to the batchlog > before allowing the batchlog manager to try to replay them, but also before > we start pushing out updates to the paired replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12783) Break up large MV mutations to prevent OOMs
[ https://issues.apache.org/jira/browse/CASSANDRA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135083#comment-16135083 ] Kurt Greaves commented on CASSANDRA-12783: -- [trunk|https://github.com/apache/cassandra/compare/trunk...kgreav:trunk-11670?expand=1] BatchlogManager is more or less done which is the main component. Haven't fixed {{LegacyBatchlogMigrator}} yet so plz ignore (the version there is from the last batchlog migration). Still got a bit of work to do on tests, but will continue with that after sorting out UUID issue. For interests sake, in the test {{testUUIDComparison}} id1 will occasionally return a UUID that is < id2, despite id2 having a timestamp (and uuid for that matter) that is far lower. This is a problem [here|https://github.com/apache/cassandra/compare/trunk...kgreav:trunk-11670?expand=1#diff-c77d5f1027ee5e8a49e106903a4ca937R318] the code will occasionally delete a batch that shouldn't be expired. Also totally open to ideas/criticism regarding the whole design, specifically the expiry stuff as not sure yet if that will have other implications. cc [~pauloricardomg] if you want to have a look > Break up large MV mutations to prevent OOMs > --- > > Key: CASSANDRA-12783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12783 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views >Reporter: Carl Yeksigian >Assignee: Kurt Greaves > Fix For: 4.x > > > We only use the code path added in CASSANDRA-12268 for the view builder > because otherwise we would break the contract of the batchlog, where some > mutations may be written and pushed out before the whole batch log has been > saved. > We would need to ensure that all of the updates make it to the batchlog > before allowing the batchlog manager to try to replay them, but also before > we start pushing out updates to the paired replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12783) Break up large MV mutations to prevent OOMs
[ https://issues.apache.org/jira/browse/CASSANDRA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134811#comment-16134811 ] Kurt Greaves commented on CASSANDRA-12783: -- Kind of. Was actually working on fixing CASSANDRA-13608 which is somewhat related, following Paulo's ideas from [here|https://issues.apache.org/jira/browse/CASSANDRA-11670?focusedCommentId=15359128=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15359128] to fix the limitation on the batchlog of only being able to contain {{max_value_size}}. This also introduces the concept of "activating" a batch mentioned in the description. Not really sure that it will prevent OOMs, but description isn't really clear what that is about. I will upload my branch somewhere shortly, have actually gotten sidetracked trying to work out why I can generate time based UUIDs that are not ordered on time... but more on that later > Break up large MV mutations to prevent OOMs > --- > > Key: CASSANDRA-12783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12783 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views >Reporter: Carl Yeksigian >Assignee: Kurt Greaves > Fix For: 4.x > > > We only use the code path added in CASSANDRA-12268 for the view builder > because otherwise we would break the contract of the batchlog, where some > mutations may be written and pushed out before the whole batch log has been > saved. > We would need to ensure that all of the updates make it to the batchlog > before allowing the batchlog manager to try to replay them, but also before > we start pushing out updates to the paired replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12783) Break up large MV mutations to prevent OOMs
[ https://issues.apache.org/jira/browse/CASSANDRA-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134653#comment-16134653 ] ZhaoYang commented on CASSANDRA-12783: -- [~KurtG] Hi, did you start with this issue? do you mind sharing your approach here? > Break up large MV mutations to prevent OOMs > --- > > Key: CASSANDRA-12783 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12783 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Materialized Views >Reporter: Carl Yeksigian >Assignee: Kurt Greaves > Fix For: 4.x > > > We only use the code path added in CASSANDRA-12268 for the view builder > because otherwise we would break the contract of the batchlog, where some > mutations may be written and pushed out before the whole batch log has been > saved. > We would need to ensure that all of the updates make it to the batchlog > before allowing the batchlog manager to try to replay them, but also before > we start pushing out updates to the paired replicas. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org