CASSANDRA-13241 lower default chunk_length_in_kb

Ariel Weisberg Thu, 11 Oct 2018 16:12:02 -0700

Hi,

This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241

This ticket has languished for a while. IMO it's too late in 4.0 to implement a
more memory efficient representation for compressed chunk offsets. However I
don't think we should put out another release with the current 64k default as
it's pretty unreasonable.

I propose that we lower the value to 16kb. 4k might never be the correct
default anyways as there is a cost to compression and 16k will still be a large
improvement.

Benedict and Jon Haddad are both +1 on making this change for 4.0. In the past
there has been some consensus about reducing this value although maybe with
more memory efficiency.

The napkin math for what this costs is:
"If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks at 8
bytes each (128MB).
With 16k chunks, that's 512MB.
With 4k chunks, it's 2G.
Per terabyte of data (pre-compression)."
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621

By way of comparison memory mapping the files has a similar cost per 4k page of
8 bytes. Multiple mappings makes this more expensive. With a default of 16kb
this would be 4x less expensive than memory mapping a file. I only mention this
to give a sense of the costs we are already paying. I am not saying they are
directly related.

I'll wait a week for discussion and if there is consensus make the change.

Regards,
Ariel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

CASSANDRA-13241 lower default chunk_length_in_kb

Reply via email to