[ https://issues.apache.org/jira/browse/CASSANDRA-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708330#comment-14708330 ]
Benedict commented on CASSANDRA-9459: ------------------------------------- A nit question relating to this ticket: In each of the main Transaction implementations we seem to be making unnecessary use of streams. These are a "hot" code paths on systems using indexes, and I'm not convinced we should be incurring the GC burden, especially when clarity isn't meaningfully improved. Compare {code} try (OpOrder.Group opGroup = Keyspace.writeOrder.start()) { Index.Indexer[] indexers = Arrays.stream(indexes) .map(i -> i.indexerFor(key, nowInSec, opGroup, Type.CLEANUP)) .toArray(Index.Indexer[]::new); Arrays.stream(indexers).forEach(Index.Indexer::begin); if (partitionDelete != null) Arrays.stream(indexers).forEach(indexer -> indexer.partitionDelete(partitionDelete)); if (row != null) Arrays.stream(indexers).forEach(indexer -> indexer.removeRow(row)); Arrays.stream(indexers).forEach(Index.Indexer::finish); } {code} with {code} try (OpOrder.Group opGroup = Keyspace.writeOrder.start()) { for (Index index : indexes) { Index.Indexer indexer = index.indexerFor(key, nowInSec, opGroup, Type.CLEANUP); indexer.begin(); if (partitionDelete != null) indexer.partitionDelete(partitionDelete); if (row != null) indexer.removeRow(row); indexer.finish(); } } {code} I'm pretty convinced the latter is clearer, and in the former I count at least 8 unnecessary allocations for the first stream, and 4 for each of the rest. A few of these allocations are >= 64Kb, and I estimate total allocation per row on the order of (but probably a little less than) 1Kb. Conversely, the old-style code performs no allocations besides that of the {{Indexer}}. > SecondaryIndex API redesign > --------------------------- > > Key: CASSANDRA-9459 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9459 > Project: Cassandra > Issue Type: Improvement > Reporter: Sam Tunnicliffe > Assignee: Sam Tunnicliffe > Fix For: 3.0 beta 1 > > > For some time now the index subsystem has been a pain point and in large part > this is due to the way that the APIs and principal classes have grown > organically over the years. It would be a good idea to conduct a wholesale > review of the area and see if we can come up with something a bit more > coherent. > A few starting points: > * There's a lot in AbstractPerColumnSecondaryIndex & its subclasses which > could be pulled up into SecondaryIndexSearcher (note that to an extent, this > is done in CASSANDRA-8099). > * SecondayIndexManager is overly complex and several of its functions should > be simplified/re-examined. The handling of which columns are indexed and > index selection on both the read and write paths are somewhat dense and > unintuitive. > * The SecondaryIndex class hierarchy is rather convoluted and could use some > serious rework. > There are a number of outstanding tickets which we should be able to roll > into this higher level one as subtasks (but I'll defer doing that until > getting into the details of the redesign): > * CASSANDRA-7771 > * CASSANDRA-8103 > * CASSANDRA-9041 > * CASSANDRA-4458 > * CASSANDRA-8505 > Whilst they're not hard dependencies, I propose that this be done on top of > both CASSANDRA-8099 and CASSANDRA-6717. The former largely because the > storage engine changes may facilitate a friendlier index API, but also > because of the changes to SIS mentioned above. As for 6717, the changes to > schema tables there will help facilitate CASSANDRA-7771. -- This message was sent by Atlassian JIRA (v6.3.4#6332)