[
https://issues.apache.org/jira/browse/CASSANDRA-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697966#action_12697966
]
Eric Evans commented on CASSANDRA-51:
-------------------------------------
In my test environment I've exhausted the JVM's heap and sent cassandra into
hours long thrashing which typically culminates in an out of memory exception
and a premature end to the test. Hence my own motivation for finding an
optimization to EBM's memory utilization.
However, whether such an optimzation is made or not, there's still bound to be
some non-Column overhead which accumulates as the number of columns increases.
The smaller the stored values are, the more space this overhead is going to
consume on the heap, despite the fact that it isn't reported by currentSize_.
The trick to preventing an out of memory crash would seem to be the careful
tuning of MemtableSizeInMB and MemtableObjectCountInMillions to both the
allocated heap size and the type of data being stored.
Currently these values aren't well advertised (I didn't know about them until I
started digging around in Memtable), so I propose the attached patch which
includes them in the sample configuration, and supplies a more conservative
value for MemtableSizeInMB.
> Memory footprint for memtable
> ------------------------------
>
> Key: CASSANDRA-51
> URL: https://issues.apache.org/jira/browse/CASSANDRA-51
> Project: Cassandra
> Issue Type: Improvement
> Environment: all
> Reporter: Sandeep Tata
> Assignee: Eric Evans
> Fix For: 0.3
>
>
> The implementation of EfficientBidiMap(EBM) today stores the column in two
> place, a map and a sorted set. Both data structures store exactly the same
> values.
> I assume we're storing this twice so that the map can give us O(1) reads
> while the sortedset is important for efficient flush. Is this tradeoff
> important ? Do we want to store the data twice to get O(1) reads over
> O(log(n)) reads from sortedset? Is the sortedset implementation broken?
> Perhaps we should consider a configuration option that turns off the map --
> write performance will be slightly improved, read performance will be
> somewhat worse, and the memory footprint will probably be about half.
> Certainly sounds like a good alternative tradeoff.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.