[
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212021#comment-13212021
]
Daniel Doubleday commented on CASSANDRA-2864:
---------------------------------------------
Ok for the curious I just wanted to report some findings
Disclaimer: I ignored counters and super cols for the time being.
I did some testing on various machines with different CPU / Mem profiles.
I tried different read / write / overwrite scenarios with reads being
normal-distributed so I could configure cache hit ratios while keeping
everything else constant.
I also tried to test the impact of different io loads by controlled stressing
of discs.
One of my initial mayor concerns was also memory footprint: how much rows can
we fit into memory without getting in real trouble with gc.
Hm results are well ... difficult. In a way I tend to think that we (or maybe
its just me) are looking in the wrong direction. Right now I believe that in
the long run caching doesn't make sense at all but for now I just want to
report some figures:
After the first real testing results looked ambivalent:
# The alternative cache is way superior in terms fo memory usage and gc. In
general I found that I can put around 10x rows in the cache.
# On the other side performance on rather cpu restricted machines was worse
than I hoped. In general it didn't really made a big difference whether I was
using the cache or had only a few memtables fully cached in page cache
Since this sucked I looked where all that cpu was burned and decided to change
the serialized row format and write custom name and slice filters. I figured
that the problem was that lots of objects are deserialized right now and theres
to much search scanning going on.
So now a row in mem loks like that:
|| Header || Column Data ||
Column offsets are encoded in the header. This way I can do binary searches and
don't need to scan.
Also the filters only ever deserialize anything when it's really returned as
relevant column.
Before I write a book... Below are some figures. These are only ment to give a
broad idea. The total performance numbers dont mean anything. This was a 4-core
server with the tester threads running on the same machine. Machine was CPU
bound in all tests.
CPU bound? Yes - right now I still can't deliver anything really conclusive in
terms of what all this means for throughput (other than that I think caching is
the wrong answer). It's all about isolated cache performance so far.
h2. Memory Footprint
Note, the memory vals are from JProfiler. I'm not sure if they are bullet proof
but should be in the right ball park.
Payload estimate derived as name (variable), nameLength (2) value (variable),
valueLength (4), timestamp (8), local delete (4), type (1)
10k Rows, 500 Columns, 4byte names, 32byte value
Payload: 5M Columns: 275M
|| Cache || Retained Size || Num Objects ||
| Standard | 1,280 MB | 10M |
| ByteBuffer | 277 MB | 20k |
10k Rows, 50 Columns, 4byte names, 1byte value
Payload: 500k Columns: 28M
|| Cache || Retained Size || Num Objects ||
| Standard | 112 MB | 900k |
| ByteBuffer | 30 MB | 20k |
h2. Performance
All rows had 500 cols with 32bytes values and int names/keys.
For a starter the following are simple 'as fast as you can' stress tests.
Performance indicator is pages / sec.
Name Filter: Random get of one col
Slice Filter: Random slice of 10 cols
Comparisons:
- No row cache but everything in page cache
- Alternative Cache File System Layout (V1)
- Standard Map Cache
- Alternative Cache New Layout (V2)
h3. No row cache, Non compacted, (average 2,5 SST reads for slices)
Get: 12k
Slice: 6.5k
h3. No row cache, Compacted
Get: 12k
Slice: 9.2k
h3. Alternative Cache V1
Get: 15.9k
Slice: 14.6k
h3. Good old non serializing row cache
Get: 25.4k
Slice: 23k
h3. Alternative Cache V2
Get: 25.5k
Slice: 24k
We still plan to take this live, but since I wrote more code than initially
thought I need to write more unit tests.
So long.
> Alternative Row Cache Implementation
> ------------------------------------
>
> Key: CASSANDRA-2864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Daniel Doubleday
> Assignee: Daniel Doubleday
> Priority: Minor
> Attachments: rowcache.patch
>
>
> we have been working on an alternative implementation to the existing row
> cache(s)
> We have 2 main goals:
> - Decrease memory -> get more rows in the cache without suffering a huge
> performance penalty
> - Reduce gc pressure
> This sounds a lot like we should be using the new serializing cache in 0.8.
> Unfortunately our workload consists of loads of updates which would
> invalidate the cache all the time.
> The second unfortunate thing is that the idea we came up with doesn't fit the
> new cache provider api...
> It looks like this:
> Like the serializing cache we basically only cache the serialized byte
> buffer. we don't serialize the bloom filter and try to do some other minor
> compression tricks (var ints etc not done yet). The main difference is that
> we don't deserialize but use the normal sstable iterators and filters as in
> the regular uncached case.
> So the read path looks like this:
> return filter.collectCollatedColumns(memtable iter, cached row iter)
> The write path is not affected. It does not update the cache
> During flush we merge all memtable updates with the cached rows.
> The attached patch is based on 0.8 branch r1143352
> It does not replace the existing row cache but sits aside it. Theres
> environment switch to choose the implementation. This way it is easy to
> benchmark performance differences.
> -DuseSSTableCache=true enables the alternative cache. It shares its
> configuration with the standard row cache. So the cache capacity is shared.
> We have duplicated a fair amount of code. First we actually refactored the
> existing sstable filter / reader but than decided to minimize dependencies.
> Also this way it is easy to customize serialization for in memory sstable
> rows.
> We have also experimented a little with compression but since this task at
> this stage is mainly to kick off discussion we wanted to keep things simple.
> But there is certainly room for optimizations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira