[
https://issues.apache.org/jira/browse/CASSANDRA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697893#action_12697893
]
Prashant Malik commented on CASSANDRA-51:
-----------------------------------------
EBM was needed since we could have time sorted columns and could still access
per column data.
Also the memory footprint is not that high as only the references are stored in
the sorted list and its not a object copy.
In our measurements using just the sorted list and searching in log n time
gave us a space optimization of 1.2X but the thoughput for high read traffic
was affected
so it was a better tradeoff to store it the way it is.
There are better ways to do this but that might involve building custom data
structures and treating name sorted structure differently from time sorted one.
> Memory footprint for memtable
> ------------------------------
>
> Key: CASSANDRA-51
> URL: https://issues.apache.org/jira/browse/CASSANDRA-51
> Project: Cassandra
> Issue Type: Improvement
> Environment: all
> Reporter: Sandeep Tata
> Assignee: Eric Evans
> Fix For: 0.3
>
>
> The implementation of EfficientBidiMap(EBM) today stores the column in two
> place, a map and a sorted set. Both data structures store exactly the same
> values.
> I assume we're storing this twice so that the map can give us O(1) reads
> while the sortedset is important for efficient flush. Is this tradeoff
> important ? Do we want to store the data twice to get O(1) reads over
> O(log(n)) reads from sortedset? Is the sortedset implementation broken?
> Perhaps we should consider a configuration option that turns off the map --
> write performance will be slightly improved, read performance will be
> somewhat worse, and the memory footprint will probably be about half.
> Certainly sounds like a good alternative tradeoff.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.