[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Joshua McKenzie (JIRA) Mon, 13 Jul 2015 13:02:35 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625243#comment-14625243
 ]


Joshua McKenzie commented on CASSANDRA-6477:
--------------------------------------------

Hate to interrupt this lively debate about null handling, but I have more 
feedback for Carl.

Last set of style/lower level feedback here. Going to start tracing the various 
paths and confirm that the current logic affords the guarantees we think they 
do as well as look into the schema a bit more.

A couple of questions:
# How clear to users is it that CAS operations are disabled by adding a MV to a 
CF? Is this going to be documentation-specific or is this going to be something 
we pop up in CQL as well?
# In MaterializedViewLongTest, why are you ignoring all Throwables from the 
BatchLogManager.instance.startBatchlogReplay().get() call?

h6. {{MaterializedView}}
* Consider renaming cfModifiesSelectedColumn. Maybe 
"updateAffectsView(AbstractPartitionData apd)"?
** Same w/MaterializedViewManager.cfModifiesSelectedColumn. "cfModifies" is 
unclear - what's the cf that's being referred to here, the legacy 
"ColumnFamily" nomenclature that's no longer used that's actually a 
PartitionUpdate?

h6. {{DeletionTimeArray}}
* Looks like an accidental change to isLive snuck in

h6. {{Keyspace}}
* I have some concerns about the garbage/performance impact of a call to 
MaterializedViewManager.touchesSelectedColumn on each Keyspace.apply. While I'm 
fine letting MV performance optimization efforts be a follow-on effort (largely 
due to the fact that the MV code is well abstracted), I'd like some more 
testing / assurances that the places where we've modified the critical path 
don't introduce performance regressions.

h6. {{RowUpdateBuilder}}
* Need to rephrase the following error message: "since no clustering hasn't 
been provided". I realize you just copied the one that was already in 
addListEntry, but may as well fix the double-negatives while we're here.

h6. {{SystemKeyspace}}
* Consider renaming the schema values for MATERIALIZEDVIEW_BUILDS and 
BUILT_MATERIALIZEDVIEWS to be consistent w/other schema entries. Maybe follow 
COMPACTIONS_IN_PROGRESS w/MATERIALIZED_VIEW_BUILDS_IN_PROGRESS = 
"materielizedviews_builds_in_progress"? The current values don't immediately 
differentiate themselves from one another to me.

h6. {{TrncateVerbHandler}}
* We should probably move MV truncation into cfs.truncateBlocking as MV are 
children of the CFS, rather than have it being a part of the VerbHandler.

h6. {{StorageProxy}}
* Need clearer differentiation between the 2 wrapBatchResponseHandler methods 
rather than just a comment. If initiating writes is a side-effect of the 
method, its name should reflect that.
* I'm also not sure that comment is accurate as the difference between the 2 
methods currently appears to be only around naturalEndpoint determination

h6. {{MaterializedViewLongTest}}
* ttlTest looks like it should be a regular unit test rather than a long

h6. {{MaterializedViewTest}}
* Comment on why you're swallowing exceptions
* Comment tests to indicate the use-cases they're testing
* Do we really need to sleep in 1000ms increments on a CREATE MATERIALIZED VIEW 
statement? (This is more a question on expected performance w/a base CF w/100 
records than anything)
* testAllTypes is going to atrophy when new types are introduced

h6. {{MaterializedViewUtilsTest}}
* Needs comments

h6. {{nits}}
* Unused imports in Keyspace.java
* Double-whitespace in Keyspace.apply
* Unused import in RowUpdateBuilder.java
* Double-whitespace in StorageProxy.mutateAtomically
* Remove left over debug logging in StorageProxy.mutateAtomically
* MaterializedViewLongTest: double whitespacing, unused import
* MaterializedViewTest: unused imports, double whitespace
* MaterializedViewUtilsTest: unused imports

h6. Things I'd like to see performance tested
* Pre-MV vs. Post-MV patch, regular read/write/mixed throughput and latency
* Pre-MV vs. Post-MV patch, batchlog stress

Given the changes to non-MV queries on the critical path, I think it's 
important we at least confirm we're not introducing any obvious performance 
regressions to those query types when excluding the MV code-paths from apply 
calls.

> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>
>         Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Reply via email to