[
https://issues.apache.org/jira/browse/CASSANDRA-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711500#comment-17711500
]
David Capwell commented on CASSANDRA-18441:
-------------------------------------------
bq. Can we agree on hardcoding the name and numeric ID in the format itself?
The name should be hard coded by the format, I feel that the numeric ID isn't
100% needed... streaming uses the name, and cache used the numeric id, so we
"could" unify on just the name. If there is a concern that name > int for
storage, we could require ascii name, so any name <= 4 in size is similar (if
not smaller) cost to store (1 byte for size, 1-3 bytes for char; total of 2-4
bytes, int is 4 bytes)
I am +0 to having a numeric id... it just adds more surface area for conflict,
and as you point out the operator can't do much if that happens. Names are
easy for projects to avoid conflict (it happens, but unique names are very
common), but numeric values could conflict easily (I predict a 3rd party format
will use id=42... its going to happen!)
bq. Can we agree on the service loader being used to discover the supported
formats or do we want them to be enumerated in the config? I'm slightly leaning
toward the service loader approach, but not insisting on that.
I am 100% in favor of using ServiceLoader to solve this problem, relying on the
config is dangerous. I am open to other solutions similar to ServiceLoader...
but that is very common in the java world, so feel that any 3rd parties won't
have to learn much if they want to provide a custom format.
bq. is that it seems to suggest that we can define multiple write formats,
while we really don't want to do that.
We are defining the config for the format, not the target format to use.
[~blambov] example says we support 2 formats ("XyzFormat" and "BtiFormat"), but
only "BtiFormat" may have configs... but we might be using "XyzFormat" for
writes as that would be selected by a different config (I propose something
like "default_sstable_format: big", the current trunk logic uses a system
property)
So to flesh my example out, we would have something like the following
{code}
default_sstable_format: xyz
sstable_formats:
bti:
row_index_granularity: 4kb
version: ca
xyz:
abc: true
version: 42
{code}
> Improvements to SSTable format configuration
> --------------------------------------------
>
> Key: CASSANDRA-18441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18441
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/SSTable
> Reporter: Branimir Lambov
> Assignee: Jacek Lewandowski
> Priority: Normal
> Fix For: 5.x
>
>
> CEP-17 and CASSANDRA-17056 abstracted some interfaces for SSTable format
> implementations and defined a method of plugging in specific configurations.
> This method is brittle and asks users to specify format identifiers whose
> configuration does not provide value but can be the source of conflicts and
> problems. On the other hand it makes important choices non-obvious, as the
> selection of format to write is given by the order of configured interfaces.
> An improved specification mechanism needs to be put in place before Cassandra
> 5 is released.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]