[ 
https://issues.apache.org/jira/browse/CASSANDRA-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-1255:
--------------------------------------

    Attachment: 1255.txt

I took a stab at a slightly different approach:

- we only try to intern column names
- we intern per-memtable

My reasoning is, the only values that are going to be resident in the heap 
long-term anyway are the objects in memtables and in the row cache.  Interning 
keys buys us nothing since they're already uniqueified by the maps involved.

A per-memtable approach lets us avoid wasting time trying to intern 
materialized-view type CFs that are going to have effectively unique names for 
each column (we simply skip the intern step if the intern map gets larger than 
a threshold, currently 128, that indicates "probably not a CF with static 
column names.")

This patch only checks top-level column names; if it looks promising we can add 
subcolumn names fairly easily.

It may even be worth doing a similar check for column values, since if the data 
is not suitable for interning we hit the threshold and stop trying fairly 
quickly.  But there are so many more data patterns that are not going to be 
internable that it is probably still a bad trade.

> Explore interning keys and column names
> ---------------------------------------
>
>                 Key: CASSANDRA-1255
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1255
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: 1255.txt
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> With multiple Memtables, key caches and row caches holding DecoratedKey 
> references, it could potentially be a huge memory savings (and relief to GC) 
> to intern DecoratedKeys. Taking the idea farther, for the skinny row pattern, 
> and for certain types of wide row patterns, interning of column names could 
> be very beneficial as well (although we would need to wrap the byte[]s in 
> something for hashCode/equals).
> This ticket should explore the benefits and overhead of interning.
> Google collections/guava MapMaker is a very convenient way to create this 
> type of cache: example call: 
> http://stackoverflow.com/questions/2865026/use-permgen-space-or-roll-my-own-intern-method/2865083#2865083

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to