[ 
https://issues.apache.org/jira/browse/CASSANDRA-18534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773559#comment-17773559
 ] 

Stefan Miklosovic edited comment on CASSANDRA-18534 at 10/10/23 6:58 AM:
-------------------------------------------------------------------------

I prefer to merge CASSANDRA-18872 first where it will be removed. So it means 
that 5.0 will _not_ have crc_check_chance in compression anymore.

Then you might rebase this work against 5.0 where crc_check_chance will not be 
in compression anymore and you might tweak FileHandler builder to propagate 
sstable format option there to align it.

[~maxwellguo] [~blambov] how do this sound to you?

BTW I think this ticket as a whole needs to have a ML thread. We are changing 
CQL here and it would be great to involve more people into this.

What seems to be a little bit "strange" to me is that we chose these properties:

row_index_granularity
bloom_filter_fp_chance
crc_check_chance
min/max_index_interval

But _why exactly these_? It seems to me that we just picked a subset of 
configuration options to be available (by some yet-uknown-to-me rule) and we 
made it configurable like that. Why is it already pre-defined what properties 
might be in sstable_format option? What would be interesting to have is that 
you might reference _every parameter_ instead of having a static set to choose 
from. 

Also, what does have crc_check_chance, for example, to do with _sstable 
format_. There is no "format" behind that. crc_check_chance (similarly 
bloom_filter_fp_chance), is just a _probability_ with which we do so and so 
operation. That is an operational parameter, we are not _formatting an sstable_ 
as such. Maybe it is just a matter of naming, I just find this to be important 
to mention. 

Also, do you think it is possible and useful to make sstable_format contain 
custom parameters? If we have a way how to specify custom format of an SSTable 
by implementing AbstractSSTableFormat, then such format might accept additional 
parameters which would be added into sstable_format like this:

{code}
... sstable_format = {"type": "mytype", "myparameter": "abc"}
{code}

That means we would not need to implement every custom parameter out there for 
whatever format. The tricky part is that if we allow custom parameters to be 
specified, then, on alternation of a schema, it would start to be a different 
schema version which would need to be propagated to the cluster.


was (Author: smiklosovic):
I prefer to merge CASSANDRA-18872 first where it will be removed. So it means 
that 5.0 will _not_ have crc_check_chance in compression anymore.

Then you might rebase this work against 5.0 where crc_check_chance will not be 
in compression anymore and you might tweak FileHandler builder to propagate 
sstable format option there to align it.

[~maxwellguo] [~blambov] how do this sound to you?

BTW I think this ticket as a whole needs to have a ML thread. We are changing 
CQL here and it would be great to involve more people into this.

What seems to be a little bit "strange" to me is that we chose these properties:

row_index_granularity
bloom_filter_fp_chance
crc_check_chance
min/max_index_interval

But _why exactly these_? Also, what does have crc_check_chance, for example, to 
do with _sstable format_. There is no "format" behind that. crc_check_chance 
(similarly bloom_filter_fp_chance), is just a _probability_ with which we do so 
and so operation. That is an operational parameter, we are not _formatting an 
sstable_ as such. Maybe it is just a matter of naming, I just find this to be 
important to mention. 

Also, do you think it is possible and useful to make sstable_format contain 
custom parameters? If we have a way how to specify custom format of an SSTable 
by implementing AbstractSSTableFormat, then such format might accept additional 
parameters which would be added into sstable_format like this:

{code}
... sstable_format = {"type": "mytype", "myparameter": "abc"}
{code}

That means we would not need to implement every custom parameter out there for 
whatever format. The tricky part is that if we allow custom parameters to be 
specified, then, on alternation of a schema, it would start to be a different 
schema version which would need to be propagated to the cluster.

> Make sstable format configurable per table
> ------------------------------------------
>
>                 Key: CASSANDRA-18534
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18534
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Schema, Local/SSTable
>            Reporter: Branimir Lambov
>            Assignee: Maxwell Guo
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Some SSTable format settings need to be configurable per table for better 
> efficiency. This includes:
>  - {{row_index_granularity}}
>  - {{bloom_filter_fp_chance}}
>  - {{crc_check_chance}}
>  - {{min/max_index_interval}}
> Some of these are currently configurable using direct properties of tables. 
> Having them as format properties makes better sense and should also support 
> specifying useable combinations of settings, e.g.
> {code:java}
> CREATE TABLE ... WITH sstable_format = "bti-fast";
> CREATE TABLE ... WITH sstable_format = "bti-small";
> {code}
> where {{bti-fast}} and {{bti-small}} can be defined in {{cassandra.yaml}} 
> e.g. as
> {code:java}
> sstable.format.options:
>   - bti-fast:
>       row_index_granularity: 1kiB
>       bloom_filter_fp_chance: 0.01
>   - bti-small:
>       row_index_granularity: 32kiB
>       bloom_filter_fp_chance: 0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to