Hi,

This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241

This ticket has languished for a while. IMO it's too late in 4.0 to implement a 
more memory efficient representation for compressed chunk offsets. However I 
don't think we should put out another release with the current 64k default as 
it's pretty unreasonable.

I propose that we lower the value to 16kb. 4k might never be the correct 
default anyways as there is a cost to compression and 16k will still be a large 
improvement.

Benedict and Jon Haddad are both +1 on making this change for 4.0. In the past 
there has been some consensus about reducing this value although maybe with 
more memory efficiency.

The napkin math for what this costs is:
"If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks at 8 
bytes each (128MB).
With 16k chunks, that's 512MB.
With 4k chunks, it's 2G.
Per terabyte of data (pre-compression)."
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621

By way of comparison memory mapping the files has a similar cost per 4k page of 
8 bytes. Multiple mappings makes this more expensive. With a default of 16kb 
this would be 4x less expensive than memory mapping a file. I only mention this 
to give a sense of the costs we are already paying. I am not saying they are 
directly related.

I'll wait a week for discussion and if there is consensus make the change.

Regards,
Ariel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to