[ 
https://issues.apache.org/jira/browse/CASSANDRA-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710920#comment-17710920
 ] 

Branimir Lambov edited comment on CASSANDRA-18441 at 4/11/23 12:39 PM:
-----------------------------------------------------------------------

SSTable read support should be able to read any variation of a format, and from 
this point of view there is little need for format configuration when it 
relates to including support for reading a specific format. In other words, I 
agree it makes most sense for the class name to be the only configuration 
parameter there. (AFAIK we have never used discovery as a method of finding 
classes to initialize.)

On the write side at the very least we need a yaml option to specify the 
selected format. I also see value in being able to give format-specific 
parameters, e.g.:
 * row index granularity
 * page size
 * the specific format version to write (enabling downgradability)

These must be configurable by node, and in the future it makes sense to also 
permit per-table configuration (e.g. a latency-sensitive table may be better 
served by a row index granularity of 1kb or less; this brings the question of 
precedence as discussed in CASSANDRA-17240).

[~dcapwell], [~maedhroz], [~jlewandowski], how would you feel about the 
following yaml scheme:

{code}
selected_sstable_format:

    class_name: org.apache.cassandra.io.sstable.format.bti.BtiFormat

    parameters:

        row_index_granularity: 4kb

        version: ca

 

additional_sstable_formats:

    - org.apache.cassandra.io.sstable.format.xyz.XyzFormat

{code}

The existing formats are implicitly added to the supported list, as is the 
selected one. The {{additional}} section lists non-standard formats that we 
want to be able to read. The format names are fixed in code, and the use of the 
integer id is replaced by a hash of the name (with some startup checks that 
there are no hash collisions).


was (Author: blambov):
SSTable read support should be able to read any variation of a format, and from 
this point of view there is little need for format configuration when it 
relates to including support for reading a specific format. In other words, I 
agree it makes most sense for the class name to be the only configuration 
parameter there. (AFAIK we have never used discovery as a method of finding 
classes to initialize.)

On the write side at the very least we need a yaml option to specify the 
selected format. I also see value in being able to give format-specific 
parameters, e.g.:
 * row index granularity
 * page size
 * the specific format version to write (enabling downgradability)

These must be configurable by node, and in the future it makes sense to also 
permit per-table configuration (e.g. a latency-sensitive table may be better 
served by a row index granularity of 1kb or less; this brings the question of 
precedence as discussed in CASSANDRA-17240).

[~dcapwell],[~maedhroz], [~jlewandowski], how would you feel about the 
following yaml scheme:

{code}
selected_sstable_format:

    class_name: org.apache.cassandra.io.sstable.format.bti.BtiFormat

    parameters:

        row_index_granularity: 4kb

        version: ca

 

additional_sstable_formats:

    - org.apache.cassandra.io.sstable.format.bzy.BzyFormat

{code}

The existing formats are implicitly added to the supported list, as is the 
selected one. The `additional` section lists non-standard formats that we want 
to be able to read. The format names are fixed in code, and the use of the 
integer id is replaced by a hash of the name (with some startup checks that 
there are no hash collisions).

> Improvements to SSTable format configuration
> --------------------------------------------
>
>                 Key: CASSANDRA-18441
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18441
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Branimir Lambov
>            Priority: Normal
>
> CEP-17 and CASSANDRA-17056 abstracted some interfaces for SSTable format 
> implementations and defined a method of plugging in specific configurations. 
> This method is brittle and asks users to specify format identifiers whose 
> configuration does not provide value but can be the source of conflicts and 
> problems. On the other hand it makes important choices non-obvious, as the 
> selection of format to write is given by the order of configured interfaces.
> An improved specification mechanism needs to be put in place before Cassandra 
> 5 is released.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to