[jira] [Commented] (CASSANDRA-18441) Improvements to SSTable format configuration

David Capwell (Jira) Wed, 12 Apr 2023 10:43:06 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711500#comment-17711500
 ]


David Capwell commented on CASSANDRA-18441:
-------------------------------------------

bq. Can we agree on hardcoding the name and numeric ID in the format itself?

The name should be hard coded by the format, I feel that the numeric ID isn't 
100% needed... streaming uses the name, and cache used the numeric id, so we 
"could" unify on just the name.  If there is a concern that name > int for 
storage, we could require ascii name, so any name <= 4 in size is similar (if 
not smaller) cost to store (1 byte for size, 1-3 bytes for char; total of 2-4 
bytes, int is 4 bytes)

I am +0 to having a numeric id... it just adds more surface area for conflict, 
and as you point out the operator can't do much if that happens.  Names are 
easy for projects to avoid conflict (it happens, but unique names are very 
common), but numeric values could conflict easily (I predict a 3rd party format 
will use id=42... its going to happen!)

bq. Can we agree on the service loader being used to discover the supported 
formats or do we want them to be enumerated in the config? I'm slightly leaning 
toward the service loader approach, but not insisting on that.

I am 100% in favor of using ServiceLoader to solve this problem, relying on the 
config is dangerous. I am open to other solutions similar to ServiceLoader... 
but that is very common in the java world, so feel that any 3rd parties won't 
have to learn much if they want to provide a custom format.

bq. is that it seems to suggest that we can define multiple write formats, 
while we really don't want to do that.

We are defining the config for the format, not the target format to use.  
[~blambov] example says we support 2 formats ("XyzFormat" and "BtiFormat"), but 
only "BtiFormat" may have configs... but we might be using "XyzFormat" for 
writes as that would be selected by a different config (I propose something 
like "default_sstable_format: big", the current trunk logic uses a system 
property)

So to flesh my example out, we would have something like the following

{code}
default_sstable_format: xyz
sstable_formats:
  bti:
    row_index_granularity: 4kb
    version: ca
  xyz:
    abc: true
    version: 42
{code}

> Improvements to SSTable format configuration
> --------------------------------------------
>
>                 Key: CASSANDRA-18441
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18441
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Branimir Lambov
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 5.x
>
>
> CEP-17 and CASSANDRA-17056 abstracted some interfaces for SSTable format 
> implementations and defined a method of plugging in specific configurations. 
> This method is brittle and asks users to specify format identifiers whose 
> configuration does not provide value but can be the source of conflicts and 
> problems. On the other hand it makes important choices non-obvious, as the 
> selection of format to write is given by the order of configured interfaces.
> An improved specification mechanism needs to be put in place before Cassandra 
> 5 is released.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-18441) Improvements to SSTable format configuration

Reply via email to