[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625243#comment-14625243 ]
Joshua McKenzie commented on CASSANDRA-6477: -------------------------------------------- Hate to interrupt this lively debate about null handling, but I have more feedback for Carl. Last set of style/lower level feedback here. Going to start tracing the various paths and confirm that the current logic affords the guarantees we think they do as well as look into the schema a bit more. A couple of questions: # How clear to users is it that CAS operations are disabled by adding a MV to a CF? Is this going to be documentation-specific or is this going to be something we pop up in CQL as well? # In MaterializedViewLongTest, why are you ignoring all Throwables from the BatchLogManager.instance.startBatchlogReplay().get() call? h6. {{MaterializedView}} * Consider renaming cfModifiesSelectedColumn. Maybe "updateAffectsView(AbstractPartitionData apd)"? ** Same w/MaterializedViewManager.cfModifiesSelectedColumn. "cfModifies" is unclear - what's the cf that's being referred to here, the legacy "ColumnFamily" nomenclature that's no longer used that's actually a PartitionUpdate? h6. {{DeletionTimeArray}} * Looks like an accidental change to isLive snuck in h6. {{Keyspace}} * I have some concerns about the garbage/performance impact of a call to MaterializedViewManager.touchesSelectedColumn on each Keyspace.apply. While I'm fine letting MV performance optimization efforts be a follow-on effort (largely due to the fact that the MV code is well abstracted), I'd like some more testing / assurances that the places where we've modified the critical path don't introduce performance regressions. h6. {{RowUpdateBuilder}} * Need to rephrase the following error message: "since no clustering hasn't been provided". I realize you just copied the one that was already in addListEntry, but may as well fix the double-negatives while we're here. h6. {{SystemKeyspace}} * Consider renaming the schema values for MATERIALIZEDVIEW_BUILDS and BUILT_MATERIALIZEDVIEWS to be consistent w/other schema entries. Maybe follow COMPACTIONS_IN_PROGRESS w/MATERIALIZED_VIEW_BUILDS_IN_PROGRESS = "materielizedviews_builds_in_progress"? The current values don't immediately differentiate themselves from one another to me. h6. {{TrncateVerbHandler}} * We should probably move MV truncation into cfs.truncateBlocking as MV are children of the CFS, rather than have it being a part of the VerbHandler. h6. {{StorageProxy}} * Need clearer differentiation between the 2 wrapBatchResponseHandler methods rather than just a comment. If initiating writes is a side-effect of the method, its name should reflect that. * I'm also not sure that comment is accurate as the difference between the 2 methods currently appears to be only around naturalEndpoint determination h6. {{MaterializedViewLongTest}} * ttlTest looks like it should be a regular unit test rather than a long h6. {{MaterializedViewTest}} * Comment on why you're swallowing exceptions * Comment tests to indicate the use-cases they're testing * Do we really need to sleep in 1000ms increments on a CREATE MATERIALIZED VIEW statement? (This is more a question on expected performance w/a base CF w/100 records than anything) * testAllTypes is going to atrophy when new types are introduced h6. {{MaterializedViewUtilsTest}} * Needs comments h6. {{nits}} * Unused imports in Keyspace.java * Double-whitespace in Keyspace.apply * Unused import in RowUpdateBuilder.java * Double-whitespace in StorageProxy.mutateAtomically * Remove left over debug logging in StorageProxy.mutateAtomically * MaterializedViewLongTest: double whitespacing, unused import * MaterializedViewTest: unused imports, double whitespace * MaterializedViewUtilsTest: unused imports h6. Things I'd like to see performance tested * Pre-MV vs. Post-MV patch, regular read/write/mixed throughput and latency * Pre-MV vs. Post-MV patch, batchlog stress Given the changes to non-MV queries on the critical path, I think it's important we at least confirm we're not introducing any obvious performance regressions to those query types when excluding the MV code-paths from apply calls. > Materialized Views (was: Global Indexes) > ---------------------------------------- > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)