[ 
https://issues.apache.org/jira/browse/CASSANDRA-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814547#comment-15814547
 ] 

Sylvain Lebresne commented on CASSANDRA-9425:
---------------------------------------------

I very much like the changes of this patch in general. I also agree with 
[~snazy] small follow-up, getting rid of {{keyspaceAndTablePair}} sounds good. 
The patch is big though so I have a bunch of remarks/suggestions, some of which 
may or may not be very useful. Some of the comments may also not be entirely 
new to this patch, I haven't double-checked everything, but as the main goal of 
this patch is to clean stuff up, let say I don't feel too bad if there is a few 
such remarks.

I have also a few things, mostly nits, that I just wrote while reviewing and 
pushed in branch below. One noticeable commit that is not entirely nitty is the 
introduction of a {{TableId}} class (that just "wrap" a UUID). It felts like 
this make the code read better with no real downside. It's a trivial patch, 
just with a big-ish diff.
| [9425-review|https://github.com/pcmanus/cassandra/commits/9425-review] | 
[utests|http://cassci.datastax.com/job/pcmanus-9425-review-testall] | 
[dtests|http://cassci.datastax.com/job/pcmanus-9425-review-dtest] |


Other points:
* {{TableMetadataRef.to()}} feels like it could be misused easily so I'd 
suggest making things a bit more explicit. In practice, it has 3 main usages:
*# in {{Schema}}, which is kind of the one true use, where I'd directly use the 
ctor (which would be package-protected and have the same comment than on 
{{set()}}).
*# for offline tools: I'd rename {{to()}} to something like 
{{forOfflineTool()}} that make it clear it's ok only in that context. A comment 
would be nice too.
*# for indexes, where we kind of hack around them not having their own ID. I 
think it's worth pointing out we're doing something specific by having a 
{{forIndex()}} method with proper comments (and probably an assertion that 
validate it's indeed called on an index). Or maybe go with my next point.
* Talking of index handling, 
{{TableMetadataRef.inPlaceUpdateIndexedTableMetadata}} feels dangerous. The 
{{set()}} method is very explicit about not being for public consumption, but 
that {{inPlaceUpdateIndexedTableMetadata}} method calls {{set()}} directly and 
is public. It's also particularly hard to convince oneself that it's used in a 
safe maner (you check first that {{SecondaryIndexManager.reload()}} is indeed 
only called protected by migration synchronization, but the call to 
{{inPlaceUpdateIndexedTableMetadata}} is on a task executed by an 
{{ExecutorService}} so you have to double-check it's not really asynchronous 
(despite what the javadoc in {{Index.java}} claims!) and thus is indeed 
protected).
  Anyway, at a minimum some comments are imo missing, but I wonder if we could 
make this more obviously correct by pushing it to {{Schema}}, having a map of 
'index name' -> TableMetadataRef specific to indexes. Updating that in 
{{Schema.load()}} shouldn't be terribly complex. Or am I missing a reason why 
that wouldn't work?
* I'm unclear on the changes to {{AlterTypeStatement}}. The patch removes the 
code that propagate type changes to other table/view/types, but I neither see 
where that has moved, nor why that wouldn't be needed anymore (as an aside, if 
it does is not needed anymore, than we could remove the {{updateTypes()}} and 
{{updateWith()}} methods that are unused in the patch).
* I'd rename {{addRegularColumn}} and {{alterColumnType}} in {{ViewMetadata}} 
as they are not modifying the {{ViewMetadata}} but creating copies (say to 
{{withAddedRegularColumn()}} and {{withAlteredColumnType}}).
* The patch changes some system table options (compared to current trunk), 
namely:
** {{AuthKeyspace}} tables don't get his {{memtableFlushPeriod}} to 1h.
** {{SystemDistributedKeyspace}} gets a default gcGrace, but it was of 10 days.
** None of the system table gets {{readRepairChance == 0}} (only 
{{dcLocaReadRepairChance}} is set to 0).
* In {{Indexes.validate()}}, not sure it's worth doing the duplicate name check 
since we do it at the keyspace level already (and only doing it withing a 
single table is a false sense of safety).
* {{Keyspaces}} has both a {{get(String)}} that return {{Optional}} and a 
{{getNullable(String)}}. We should probably remove one.
* There is {{TODO: FIXME}} in ColumnFamilyStore.setCompressionParameters.
* There is a {{TODO FIXME}} in SchemaKeyspace (and not 100% sure what it's 
about).
* Nit: In {{SSTableTxnWriter}}, it feels slightly inconsistent that 
{{createRangeAware}} takes a {{TableMetadata}}, but {{create}} takes a 
{{TableMetadataRef}}.
* In {{Schema}}:
** Realized afterwards that this wasn't new to this patch, but mentioning it 
nonetheless: loading the system keyspaces in Schema ctor feels dodgy, 
especially since it's conditional on some initialization methods (and it's a 
singleton). What if Schema somehow happens to be loaded too early for instance? 
Anyway, would be nice to move that code to some {{loadSystemTables()}} function 
called in the proper place. Also, the {{Schema}} ctor should probably be 
private.
** slightly uncomfortable with {{load()}} as it is. It allow both new and 
updated keyspace, but doesn't do everything it should on an update (it doesn't 
reload {{ColumnFamilyStore}}, ...). And while those other parts are done 
properly by {{merge()}} and true schema migrations, {{load()}} is public so it 
feels easy to misuse. Even for brand new keyspaces, it's a bit confusing that 
{{load()}} is only a sub-part of what {{Schema.createKeyspace()}} (it works due 
to 1) my next point and 2) the fact we happen to not care about notifications 
when load() is called, but it's a bit messy).
** Some of {{Schema.createKeyspace()}} is actually redundant: {{Keyspace}} ctor 
already initialize all tables and views, so it ends up being duplicated. It's 
largely harmless though we do end up reloading each {{ColumnFamilyStore}} for 
no reason. Taken with my previous point, I wonder if things could be refactored 
a bit so it's clearer when initialization happens.
** {{Schema.load}} and {{Schema.unload}} should probably be synchronized for 
their modification of {{keyspaces}}. I acknowledge that those method are called 
by other methods that _are_ synchronized, but 1) it feels adding 
synchronization directly on those method at least adds clarify and prevent 
future misues and 2) not all path are actually synchronized since {{load()}} is 
public (here again, the concrete usage are actually fine, so not claiming a 
bug, just suggesting making things more future proof).
** {{validateTable}} seems to replace {{Validation.validateColumnFamily}} but 
the latter is still there and used in a few instances. Would be nice to clean 
up. {{Validation.validateKeyspaceNotSystem}} could probably be migrated too for 
consistency.
** I'd maybe suggest adding a private {{handleDiff(mapDiff, Function onDropped, 
Function onAdded, Function onUpdated)}} method to simplify a bit.
** Is there a rational for notifying everything at the end, versus calling each 
notification in the "appropriate" sub-method (notifying table alter in 
{{alterTable()}}, ....)? The later would feels a tad more readable to me as 
it's slightly easier to check we haven't forgotten something.
* In {{TableMetadata}}, validateCompatibility() should probably check more 
stuffs (we can't alter non-PK column types as we want, nor change the partition 
key size for instance).

Also, as the patch does quite a bit of renaming, allow me a few additional 
suggestions. I won't get mad if you'd rather not do those though:
* I'd rename {{TableMetadata.table}} to {{TableMetadata.name}} for consistency 
with other classes ({{KeyspaceMetadata}} and {{IndexMetadata}} at least). While 
at it, I'd also rename {{ksName}} to {{keyspace}} and {{viewName}} to {{name}} 
in {{ViewMetadata}}.
* I'd also probably rename {{PartitionColumns}} to {{RegularAndStaticColumns}} 
now.

> Make node-local schema fully immutable
> --------------------------------------
>
>                 Key: CASSANDRA-9425
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9425
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Aleksey Yeschenko
>            Assignee: Aleksey Yeschenko
>             Fix For: 4.0
>
>
> The way we handle schema changes currently is inherently racy.
> All of our {{SchemaAlteringStatement}} s perform validation on a schema state 
> that's won't necessarily be there when the statement gets executed and 
> mutates schema.
> We should make all the *Metadata classes ({{KeyspaceMetadata, 
> TableMetadata}}, {{ColumnMetadata}}, immutable, and local schema persistently 
> snapshottable, with a single top-level {{AtomicReference}} to the current 
> snapshot. Have DDL statements perform validation and transformation on the 
> same state.
> In pseudo-code, think
> {code}
> public interface DDLStatement
> {
>     /**
>      * Validates that the DDL statement can be applied to the provided schema 
> snapshot.
>      *
>      * @param schema snapshot of schema before executing CREATE KEYSPACE
>      */
>     void validate(SchemaSnapshot schema);
>  
>     /**
>      * Applies the DDL statement to the provided schema snapshot.
>      * Implies that validate() has already been called on the provided 
> snapshot.
>      *
>      * @param schema snapshot of schema before executing the statement
>      * @return snapshot of schema as it would be after executing the statement
>      */
>     SchemaSnapshot transform(SchemaSnapshot schema);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to