Aleksey Yeschenko created CASSANDRA-6945:
--------------------------------------------
Summary: Calculate liveRatio on per-memtable basis, non per-CF
Key: CASSANDRA-6945
URL: https://issues.apache.org/jira/browse/CASSANDRA-6945
Project: Cassandra
Issue Type: Bug
Reporter: Aleksey Yeschenko
Assignee: Aleksey Yeschenko
Currently we recalculate live ratio every doubling of write ops to the CF, not
to an individual memtable. The value itself is also CF-bound, not
memtable-bound. This is causing at least several issues:
1. Depending on what stage the current memtable is, the live ratio calculated
can vary *a lot*
2. That calculated live ratio will potentially stay that way for quite a while
- the longer C* process is on, the longer it would stay incorrect
3. Incorrect live ratio means inefficient MeteredFlusher - flushing less or
more often than needed, picking bad candidates for flushing, etc.
4. Incorrect live ratio means incorrect size returned to the metrics consumers
5. Compaction strategies that rely on memtable size estimation are affected
6. All of the above is slightly amplified by the fact that all the memtables
pending flush would also use that one incorrect value
Depending on the stage the current memtable at the moment of live ratio
recalculation is, the value calculated can be *extremely* wrong (say, a
recently created, fresh memtable - would have a much higher than average live
ratio).
The suggested fix is to bind live ratio to individual memtables, not column
families as a whole, with some optimizations to make recalculations run less
often by inheriting previous memtable's stats.
--
This message was sent by Atlassian JIRA
(v6.2#6252)