[
https://issues.apache.org/jira/browse/CASSANDRA-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814547#comment-15814547
]
Sylvain Lebresne commented on CASSANDRA-9425:
---------------------------------------------
I very much like the changes of this patch in general. I also agree with
[~snazy] small follow-up, getting rid of {{keyspaceAndTablePair}} sounds good.
The patch is big though so I have a bunch of remarks/suggestions, some of which
may or may not be very useful. Some of the comments may also not be entirely
new to this patch, I haven't double-checked everything, but as the main goal of
this patch is to clean stuff up, let say I don't feel too bad if there is a few
such remarks.
I have also a few things, mostly nits, that I just wrote while reviewing and
pushed in branch below. One noticeable commit that is not entirely nitty is the
introduction of a {{TableId}} class (that just "wrap" a UUID). It felts like
this make the code read better with no real downside. It's a trivial patch,
just with a big-ish diff.
| [9425-review|https://github.com/pcmanus/cassandra/commits/9425-review] |
[utests|http://cassci.datastax.com/job/pcmanus-9425-review-testall] |
[dtests|http://cassci.datastax.com/job/pcmanus-9425-review-dtest] |
Other points:
* {{TableMetadataRef.to()}} feels like it could be misused easily so I'd
suggest making things a bit more explicit. In practice, it has 3 main usages:
*# in {{Schema}}, which is kind of the one true use, where I'd directly use the
ctor (which would be package-protected and have the same comment than on
{{set()}}).
*# for offline tools: I'd rename {{to()}} to something like
{{forOfflineTool()}} that make it clear it's ok only in that context. A comment
would be nice too.
*# for indexes, where we kind of hack around them not having their own ID. I
think it's worth pointing out we're doing something specific by having a
{{forIndex()}} method with proper comments (and probably an assertion that
validate it's indeed called on an index). Or maybe go with my next point.
* Talking of index handling,
{{TableMetadataRef.inPlaceUpdateIndexedTableMetadata}} feels dangerous. The
{{set()}} method is very explicit about not being for public consumption, but
that {{inPlaceUpdateIndexedTableMetadata}} method calls {{set()}} directly and
is public. It's also particularly hard to convince oneself that it's used in a
safe maner (you check first that {{SecondaryIndexManager.reload()}} is indeed
only called protected by migration synchronization, but the call to
{{inPlaceUpdateIndexedTableMetadata}} is on a task executed by an
{{ExecutorService}} so you have to double-check it's not really asynchronous
(despite what the javadoc in {{Index.java}} claims!) and thus is indeed
protected).
Anyway, at a minimum some comments are imo missing, but I wonder if we could
make this more obviously correct by pushing it to {{Schema}}, having a map of
'index name' -> TableMetadataRef specific to indexes. Updating that in
{{Schema.load()}} shouldn't be terribly complex. Or am I missing a reason why
that wouldn't work?
* I'm unclear on the changes to {{AlterTypeStatement}}. The patch removes the
code that propagate type changes to other table/view/types, but I neither see
where that has moved, nor why that wouldn't be needed anymore (as an aside, if
it does is not needed anymore, than we could remove the {{updateTypes()}} and
{{updateWith()}} methods that are unused in the patch).
* I'd rename {{addRegularColumn}} and {{alterColumnType}} in {{ViewMetadata}}
as they are not modifying the {{ViewMetadata}} but creating copies (say to
{{withAddedRegularColumn()}} and {{withAlteredColumnType}}).
* The patch changes some system table options (compared to current trunk),
namely:
** {{AuthKeyspace}} tables don't get his {{memtableFlushPeriod}} to 1h.
** {{SystemDistributedKeyspace}} gets a default gcGrace, but it was of 10 days.
** None of the system table gets {{readRepairChance == 0}} (only
{{dcLocaReadRepairChance}} is set to 0).
* In {{Indexes.validate()}}, not sure it's worth doing the duplicate name check
since we do it at the keyspace level already (and only doing it withing a
single table is a false sense of safety).
* {{Keyspaces}} has both a {{get(String)}} that return {{Optional}} and a
{{getNullable(String)}}. We should probably remove one.
* There is {{TODO: FIXME}} in ColumnFamilyStore.setCompressionParameters.
* There is a {{TODO FIXME}} in SchemaKeyspace (and not 100% sure what it's
about).
* Nit: In {{SSTableTxnWriter}}, it feels slightly inconsistent that
{{createRangeAware}} takes a {{TableMetadata}}, but {{create}} takes a
{{TableMetadataRef}}.
* In {{Schema}}:
** Realized afterwards that this wasn't new to this patch, but mentioning it
nonetheless: loading the system keyspaces in Schema ctor feels dodgy,
especially since it's conditional on some initialization methods (and it's a
singleton). What if Schema somehow happens to be loaded too early for instance?
Anyway, would be nice to move that code to some {{loadSystemTables()}} function
called in the proper place. Also, the {{Schema}} ctor should probably be
private.
** slightly uncomfortable with {{load()}} as it is. It allow both new and
updated keyspace, but doesn't do everything it should on an update (it doesn't
reload {{ColumnFamilyStore}}, ...). And while those other parts are done
properly by {{merge()}} and true schema migrations, {{load()}} is public so it
feels easy to misuse. Even for brand new keyspaces, it's a bit confusing that
{{load()}} is only a sub-part of what {{Schema.createKeyspace()}} (it works due
to 1) my next point and 2) the fact we happen to not care about notifications
when load() is called, but it's a bit messy).
** Some of {{Schema.createKeyspace()}} is actually redundant: {{Keyspace}} ctor
already initialize all tables and views, so it ends up being duplicated. It's
largely harmless though we do end up reloading each {{ColumnFamilyStore}} for
no reason. Taken with my previous point, I wonder if things could be refactored
a bit so it's clearer when initialization happens.
** {{Schema.load}} and {{Schema.unload}} should probably be synchronized for
their modification of {{keyspaces}}. I acknowledge that those method are called
by other methods that _are_ synchronized, but 1) it feels adding
synchronization directly on those method at least adds clarify and prevent
future misues and 2) not all path are actually synchronized since {{load()}} is
public (here again, the concrete usage are actually fine, so not claiming a
bug, just suggesting making things more future proof).
** {{validateTable}} seems to replace {{Validation.validateColumnFamily}} but
the latter is still there and used in a few instances. Would be nice to clean
up. {{Validation.validateKeyspaceNotSystem}} could probably be migrated too for
consistency.
** I'd maybe suggest adding a private {{handleDiff(mapDiff, Function onDropped,
Function onAdded, Function onUpdated)}} method to simplify a bit.
** Is there a rational for notifying everything at the end, versus calling each
notification in the "appropriate" sub-method (notifying table alter in
{{alterTable()}}, ....)? The later would feels a tad more readable to me as
it's slightly easier to check we haven't forgotten something.
* In {{TableMetadata}}, validateCompatibility() should probably check more
stuffs (we can't alter non-PK column types as we want, nor change the partition
key size for instance).
Also, as the patch does quite a bit of renaming, allow me a few additional
suggestions. I won't get mad if you'd rather not do those though:
* I'd rename {{TableMetadata.table}} to {{TableMetadata.name}} for consistency
with other classes ({{KeyspaceMetadata}} and {{IndexMetadata}} at least). While
at it, I'd also rename {{ksName}} to {{keyspace}} and {{viewName}} to {{name}}
in {{ViewMetadata}}.
* I'd also probably rename {{PartitionColumns}} to {{RegularAndStaticColumns}}
now.
> Make node-local schema fully immutable
> --------------------------------------
>
> Key: CASSANDRA-9425
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9425
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Aleksey Yeschenko
> Assignee: Aleksey Yeschenko
> Fix For: 4.0
>
>
> The way we handle schema changes currently is inherently racy.
> All of our {{SchemaAlteringStatement}} s perform validation on a schema state
> that's won't necessarily be there when the statement gets executed and
> mutates schema.
> We should make all the *Metadata classes ({{KeyspaceMetadata,
> TableMetadata}}, {{ColumnMetadata}}, immutable, and local schema persistently
> snapshottable, with a single top-level {{AtomicReference}} to the current
> snapshot. Have DDL statements perform validation and transformation on the
> same state.
> In pseudo-code, think
> {code}
> public interface DDLStatement
> {
> /**
> * Validates that the DDL statement can be applied to the provided schema
> snapshot.
> *
> * @param schema snapshot of schema before executing CREATE KEYSPACE
> */
> void validate(SchemaSnapshot schema);
>
> /**
> * Applies the DDL statement to the provided schema snapshot.
> * Implies that validate() has already been called on the provided
> snapshot.
> *
> * @param schema snapshot of schema before executing the statement
> * @return snapshot of schema as it would be after executing the statement
> */
> SchemaSnapshot transform(SchemaSnapshot schema);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)