[ 
https://issues.apache.org/jira/browse/CASSANDRA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697931#action_12697931
 ] 

Jonathan Ellis commented on CASSANDRA-51:
-----------------------------------------

What I think makes the memory overhead issue more confusing is that we 
meticulously report every byte used for Column for the "is it time to flush 
yet" method, but ignore overhead from SuperColumn and ColumnFamily internals.

IMO we should either include estimates for these factors or just include (key + 
value + timestamp) size for Column so at least we're consistent.

>  Memory footprint for memtable
> ------------------------------
>
>                 Key: CASSANDRA-51
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-51
>             Project: Cassandra
>          Issue Type: Improvement
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Eric Evans
>             Fix For: 0.3
>
>
> The implementation of EfficientBidiMap(EBM) today stores the column in two 
> place, a map and a sorted set. Both data structures store exactly the same 
> values.
> I assume we're storing this twice so that the map can give us O(1) reads 
> while the sortedset is important for efficient flush. Is this tradeoff 
> important ? Do we want to store the data twice to get O(1) reads over 
> O(log(n)) reads from sortedset? Is the sortedset implementation broken? 
> Perhaps we should consider a configuration option that turns off the map -- 
> write performance will be slightly improved, read performance will be 
> somewhat worse, and the memory footprint will probably be about half. 
> Certainly sounds like a good alternative tradeoff.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to