[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Joshua McKenzie (JIRA) Tue, 07 Jul 2015 12:27:29 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617221#comment-14617221
 ]


Joshua McKenzie commented on CASSANDRA-6477:
--------------------------------------------

h6. Overall:
Needs comments. Class level, javadoc where appropriate, etc.

h6. {{CFMetaData}}
* duplicate entry for comparison if triggers added in CFMetadata.triggers
* nit: space after CFMetaData.getMaterializedViews function end

h6. {{MaterializedViewManager}}
* buildIfRequired: scope of whether it's required or skipped isn't in this 
method or within view.build(), so it looks like it's just building (or at least 
submitting the builder to the CompactionManager)
* You're using SystemKeyspace.setIndexRemoved to mark the MV removed in 
removeMaterializedView but not the parallel setIndexBuilt to set them as built.
* allViews has unnecessary genericity
* nits:
** Some unused imports
** Consider renaming reload to indicate what it's reloading (e.g. reloadViews, 
reloadChickens, reloadDolphins)

h6. {{MaterializedView}}
* createForDeletionInfo:
** Consider caching/precomputing the results of CFMetaData.hasComplexColumns 
rather than iterating through them twice on each call (once on 
CFMetaData.hascomplexColumns, again in for loop in createForDeletionInfo)
*** Perhaps hooking into addOrReplaceColumnDefinition, removeColumnDefinition 
and updating MV's with the data, keep a set inside MaterializedView and 
reference that
* Don't need MaterializedView.reload() as it's just a passthrough to build and 
used in 1 place
* Tighten up scoping on member variables - some package private that can be 
explicitly private
* ctor: Duplication in the ctor in treatment of MVDefinition partition and 
clustering columns - could use a slight refactor to a function.
* createTombstone / createComplexTombstone: refactor out building 
viewClusteringValues into method to remove duplication
* targetPartitionKey: collapse spacing on:
{noformat}
return viewCfs
   .partitioner
   .decorateKey(CFMetaData
                  .serializePartitionKey(viewCfs
                                       .metadata
                                       .getKeyValidatorAsClusteringComparator()
                                       .make(partitionKey)));
{noformat}
to something like:
{noformat}
return 
viewCfs.partitioner.decorateKey(CFMetaData.serializePartitionKey(viewCfs.metadata
                                                                        
.getKeyValidatorAsClusteringComparator()
                                                                        
.make(partitionKey)));
{noformat}
* createPartitionTombstonesForUpdates: Can simply return mutation @ end of 
function
* createForDeletionInfo: unused parameter consistency
* Inconsistent naming - using mutationUnits scattered throughout vs. 
LiveRowState
* Rename {{query}} to {{queryMaterializedViewData}} or something similar that 
denotes what you're querying. Perhaps {{buildMVData}}
* nits:
** Some unused imports
** ctor: Change nonPrimaryKeyCol to allPrimaryKeyCol as true, flip to false 
when non found so we're not assigning a double-negative to 
targetHasAllPrimaryKeyColumns
** extra space in MaterializedView.createForDeletionInfo
** ctor declaration fits on 1 line < 120 char, same for some other lines that 
have been wrapped
** hte in javadoc for targetPartitionKey

h6. {{MaterializedViewDefinition}}
* Consider caching sets of columns in the MV rather than pulling from 
CFMetaData and iterating. This would allow for faster lookup and simpler code 
than having to iterate across all members (MaterializedViewDefinition.selects 
for instance).
* Consider using Sets of columns rather than Lists internally, would remove a 
lot of O(N) lookups and duplicate code logic scattered throughout the class 
(selects, renameColumn, etc)
* selects(): if included.isEmpty() is apparently an indicator that it includes 
all. We should 1) comment that and 2) wrap a method around that behavior, e.g. 
'boolean isAllInclusive()' that documents that behavior in the code.
* nits:
** I prefer copy constructors to a .copy() method and I think I've seen us err 
on the side of the copy constructor. I could be wrong here.
** assert included != null and !isEmpty on ctor
** extra whitespace in renameColumn @ top of method, @ end of method

h6. {{LiveRowState}}
* The name conflates with LivenessInfo (at least for me) which we discussed on 
IRC. My biggest hurdle to grokking this class was un-plugging that and 
re-plugging the idea that it's really about materializing the data for the 
MaterializedView. Perhaps rename to something like {{TemporalRow}}, with 
{{LRSCell}} becoming {{TemporalCell}}? While LRSCell doesn't really have any 
temporal data above and beyond a general Cell, it does contain its own 
{{reconcile}} that takes temporality into account which makes sense.
* Instead of {{addUnit}} / {{getExistingUnit}} in LiveRowState.Set, if going 
with the above these could be named {{addRow}}, {{getExistingRow}}
* Think we can get rid of getInternedUnit entirely and just have a section in 
addUnit to initialize empty entries
* Consider having LRSCell implement db.rows.Cell and adding a 
{{Conflicts.resolveCells(Cell left, Cell right, int nowInSec)}} method that 
passes through to {{Conficts.resolveRegular}}. Simplifies up a few different 
users of the method throughout the code-base.
* Can do away w/PrivateResolver vs. Resolver. LRSCell being private should 
prevent non-class anonymous instantiations of the interface.
* Various unused methods and constructors
* Comment on whether or not retention of ordering is important in {{baseSlice}} 
as you could do away w/interim ByteBuffer array if not. I believe it is, but 
worth clarifying.
* nit: whitespace inconsistencies in {{values(...)}}

I definitely have more to review but figured I'd get the feedback I have thus 
far into here. I'm considering doing subsequent reviews as a branch I push to 
github w/comments inline (similar to what Benedict's been doing lately) with 
feedback as putting this quantity of feedback in Jira comments is a burden for 
both you and I - thoughts?

> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>
>         Attachments: test-view-data.sh
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Reply via email to