[ https://issues.apache.org/jira/browse/CASSANDRA-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814547#comment-15814547 ]
Sylvain Lebresne commented on CASSANDRA-9425: --------------------------------------------- I very much like the changes of this patch in general. I also agree with [~snazy] small follow-up, getting rid of {{keyspaceAndTablePair}} sounds good. The patch is big though so I have a bunch of remarks/suggestions, some of which may or may not be very useful. Some of the comments may also not be entirely new to this patch, I haven't double-checked everything, but as the main goal of this patch is to clean stuff up, let say I don't feel too bad if there is a few such remarks. I have also a few things, mostly nits, that I just wrote while reviewing and pushed in branch below. One noticeable commit that is not entirely nitty is the introduction of a {{TableId}} class (that just "wrap" a UUID). It felts like this make the code read better with no real downside. It's a trivial patch, just with a big-ish diff. | [9425-review|https://github.com/pcmanus/cassandra/commits/9425-review] | [utests|http://cassci.datastax.com/job/pcmanus-9425-review-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-9425-review-dtest] | Other points: * {{TableMetadataRef.to()}} feels like it could be misused easily so I'd suggest making things a bit more explicit. In practice, it has 3 main usages: *# in {{Schema}}, which is kind of the one true use, where I'd directly use the ctor (which would be package-protected and have the same comment than on {{set()}}). *# for offline tools: I'd rename {{to()}} to something like {{forOfflineTool()}} that make it clear it's ok only in that context. A comment would be nice too. *# for indexes, where we kind of hack around them not having their own ID. I think it's worth pointing out we're doing something specific by having a {{forIndex()}} method with proper comments (and probably an assertion that validate it's indeed called on an index). Or maybe go with my next point. * Talking of index handling, {{TableMetadataRef.inPlaceUpdateIndexedTableMetadata}} feels dangerous. The {{set()}} method is very explicit about not being for public consumption, but that {{inPlaceUpdateIndexedTableMetadata}} method calls {{set()}} directly and is public. It's also particularly hard to convince oneself that it's used in a safe maner (you check first that {{SecondaryIndexManager.reload()}} is indeed only called protected by migration synchronization, but the call to {{inPlaceUpdateIndexedTableMetadata}} is on a task executed by an {{ExecutorService}} so you have to double-check it's not really asynchronous (despite what the javadoc in {{Index.java}} claims!) and thus is indeed protected). Anyway, at a minimum some comments are imo missing, but I wonder if we could make this more obviously correct by pushing it to {{Schema}}, having a map of 'index name' -> TableMetadataRef specific to indexes. Updating that in {{Schema.load()}} shouldn't be terribly complex. Or am I missing a reason why that wouldn't work? * I'm unclear on the changes to {{AlterTypeStatement}}. The patch removes the code that propagate type changes to other table/view/types, but I neither see where that has moved, nor why that wouldn't be needed anymore (as an aside, if it does is not needed anymore, than we could remove the {{updateTypes()}} and {{updateWith()}} methods that are unused in the patch). * I'd rename {{addRegularColumn}} and {{alterColumnType}} in {{ViewMetadata}} as they are not modifying the {{ViewMetadata}} but creating copies (say to {{withAddedRegularColumn()}} and {{withAlteredColumnType}}). * The patch changes some system table options (compared to current trunk), namely: ** {{AuthKeyspace}} tables don't get his {{memtableFlushPeriod}} to 1h. ** {{SystemDistributedKeyspace}} gets a default gcGrace, but it was of 10 days. ** None of the system table gets {{readRepairChance == 0}} (only {{dcLocaReadRepairChance}} is set to 0). * In {{Indexes.validate()}}, not sure it's worth doing the duplicate name check since we do it at the keyspace level already (and only doing it withing a single table is a false sense of safety). * {{Keyspaces}} has both a {{get(String)}} that return {{Optional}} and a {{getNullable(String)}}. We should probably remove one. * There is {{TODO: FIXME}} in ColumnFamilyStore.setCompressionParameters. * There is a {{TODO FIXME}} in SchemaKeyspace (and not 100% sure what it's about). * Nit: In {{SSTableTxnWriter}}, it feels slightly inconsistent that {{createRangeAware}} takes a {{TableMetadata}}, but {{create}} takes a {{TableMetadataRef}}. * In {{Schema}}: ** Realized afterwards that this wasn't new to this patch, but mentioning it nonetheless: loading the system keyspaces in Schema ctor feels dodgy, especially since it's conditional on some initialization methods (and it's a singleton). What if Schema somehow happens to be loaded too early for instance? Anyway, would be nice to move that code to some {{loadSystemTables()}} function called in the proper place. Also, the {{Schema}} ctor should probably be private. ** slightly uncomfortable with {{load()}} as it is. It allow both new and updated keyspace, but doesn't do everything it should on an update (it doesn't reload {{ColumnFamilyStore}}, ...). And while those other parts are done properly by {{merge()}} and true schema migrations, {{load()}} is public so it feels easy to misuse. Even for brand new keyspaces, it's a bit confusing that {{load()}} is only a sub-part of what {{Schema.createKeyspace()}} (it works due to 1) my next point and 2) the fact we happen to not care about notifications when load() is called, but it's a bit messy). ** Some of {{Schema.createKeyspace()}} is actually redundant: {{Keyspace}} ctor already initialize all tables and views, so it ends up being duplicated. It's largely harmless though we do end up reloading each {{ColumnFamilyStore}} for no reason. Taken with my previous point, I wonder if things could be refactored a bit so it's clearer when initialization happens. ** {{Schema.load}} and {{Schema.unload}} should probably be synchronized for their modification of {{keyspaces}}. I acknowledge that those method are called by other methods that _are_ synchronized, but 1) it feels adding synchronization directly on those method at least adds clarify and prevent future misues and 2) not all path are actually synchronized since {{load()}} is public (here again, the concrete usage are actually fine, so not claiming a bug, just suggesting making things more future proof). ** {{validateTable}} seems to replace {{Validation.validateColumnFamily}} but the latter is still there and used in a few instances. Would be nice to clean up. {{Validation.validateKeyspaceNotSystem}} could probably be migrated too for consistency. ** I'd maybe suggest adding a private {{handleDiff(mapDiff, Function onDropped, Function onAdded, Function onUpdated)}} method to simplify a bit. ** Is there a rational for notifying everything at the end, versus calling each notification in the "appropriate" sub-method (notifying table alter in {{alterTable()}}, ....)? The later would feels a tad more readable to me as it's slightly easier to check we haven't forgotten something. * In {{TableMetadata}}, validateCompatibility() should probably check more stuffs (we can't alter non-PK column types as we want, nor change the partition key size for instance). Also, as the patch does quite a bit of renaming, allow me a few additional suggestions. I won't get mad if you'd rather not do those though: * I'd rename {{TableMetadata.table}} to {{TableMetadata.name}} for consistency with other classes ({{KeyspaceMetadata}} and {{IndexMetadata}} at least). While at it, I'd also rename {{ksName}} to {{keyspace}} and {{viewName}} to {{name}} in {{ViewMetadata}}. * I'd also probably rename {{PartitionColumns}} to {{RegularAndStaticColumns}} now. > Make node-local schema fully immutable > -------------------------------------- > > Key: CASSANDRA-9425 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9425 > Project: Cassandra > Issue Type: Sub-task > Reporter: Aleksey Yeschenko > Assignee: Aleksey Yeschenko > Fix For: 4.0 > > > The way we handle schema changes currently is inherently racy. > All of our {{SchemaAlteringStatement}} s perform validation on a schema state > that's won't necessarily be there when the statement gets executed and > mutates schema. > We should make all the *Metadata classes ({{KeyspaceMetadata, > TableMetadata}}, {{ColumnMetadata}}, immutable, and local schema persistently > snapshottable, with a single top-level {{AtomicReference}} to the current > snapshot. Have DDL statements perform validation and transformation on the > same state. > In pseudo-code, think > {code} > public interface DDLStatement > { > /** > * Validates that the DDL statement can be applied to the provided schema > snapshot. > * > * @param schema snapshot of schema before executing CREATE KEYSPACE > */ > void validate(SchemaSnapshot schema); > > /** > * Applies the DDL statement to the provided schema snapshot. > * Implies that validate() has already been called on the provided > snapshot. > * > * @param schema snapshot of schema before executing the statement > * @return snapshot of schema as it would be after executing the statement > */ > SchemaSnapshot transform(SchemaSnapshot schema); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)