[
https://issues.apache.org/jira/browse/CASSANDRA-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710920#comment-17710920
]
Branimir Lambov edited comment on CASSANDRA-18441 at 4/11/23 12:39 PM:
-----------------------------------------------------------------------
SSTable read support should be able to read any variation of a format, and from
this point of view there is little need for format configuration when it
relates to including support for reading a specific format. In other words, I
agree it makes most sense for the class name to be the only configuration
parameter there. (AFAIK we have never used discovery as a method of finding
classes to initialize.)
On the write side at the very least we need a yaml option to specify the
selected format. I also see value in being able to give format-specific
parameters, e.g.:
* row index granularity
* page size
* the specific format version to write (enabling downgradability)
These must be configurable by node, and in the future it makes sense to also
permit per-table configuration (e.g. a latency-sensitive table may be better
served by a row index granularity of 1kb or less; this brings the question of
precedence as discussed in CASSANDRA-17240).
[~dcapwell], [~maedhroz], [~jlewandowski], how would you feel about the
following yaml scheme:
{code}
selected_sstable_format:
class_name: org.apache.cassandra.io.sstable.format.bti.BtiFormat
parameters:
row_index_granularity: 4kb
version: ca
additional_sstable_formats:
- org.apache.cassandra.io.sstable.format.xyz.XyzFormat
{code}
The existing formats are implicitly added to the supported list, as is the
selected one. The {{additional}} section lists non-standard formats that we
want to be able to read. The format names are fixed in code, and the use of the
integer id is replaced by a hash of the name (with some startup checks that
there are no hash collisions).
was (Author: blambov):
SSTable read support should be able to read any variation of a format, and from
this point of view there is little need for format configuration when it
relates to including support for reading a specific format. In other words, I
agree it makes most sense for the class name to be the only configuration
parameter there. (AFAIK we have never used discovery as a method of finding
classes to initialize.)
On the write side at the very least we need a yaml option to specify the
selected format. I also see value in being able to give format-specific
parameters, e.g.:
* row index granularity
* page size
* the specific format version to write (enabling downgradability)
These must be configurable by node, and in the future it makes sense to also
permit per-table configuration (e.g. a latency-sensitive table may be better
served by a row index granularity of 1kb or less; this brings the question of
precedence as discussed in CASSANDRA-17240).
[~dcapwell],[~maedhroz], [~jlewandowski], how would you feel about the
following yaml scheme:
{code}
selected_sstable_format:
class_name: org.apache.cassandra.io.sstable.format.bti.BtiFormat
parameters:
row_index_granularity: 4kb
version: ca
additional_sstable_formats:
- org.apache.cassandra.io.sstable.format.bzy.BzyFormat
{code}
The existing formats are implicitly added to the supported list, as is the
selected one. The `additional` section lists non-standard formats that we want
to be able to read. The format names are fixed in code, and the use of the
integer id is replaced by a hash of the name (with some startup checks that
there are no hash collisions).
> Improvements to SSTable format configuration
> --------------------------------------------
>
> Key: CASSANDRA-18441
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18441
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/SSTable
> Reporter: Branimir Lambov
> Priority: Normal
>
> CEP-17 and CASSANDRA-17056 abstracted some interfaces for SSTable format
> implementations and defined a method of plugging in specific configurations.
> This method is brittle and asks users to specify format identifiers whose
> configuration does not provide value but can be the source of conflicts and
> problems. On the other hand it makes important choices non-obvious, as the
> selection of format to write is given by the order of configured interfaces.
> An improved specification mechanism needs to be put in place before Cassandra
> 5 is released.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]