[jira] [Commented] (CASSANDRA-4175) Reduce memory, disk space, and cpu usage with a column name/id map

Edward Capriolo (JIRA) Sat, 22 Nov 2014 15:30:50 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222249#comment-14222249
 ]


Edward Capriolo commented on CASSANDRA-4175:
--------------------------------------------

There was once a https://twitter.com/roflscaletips suggestion that said 
something to the effect of "make mongo faster by using small column names". The 
same advice applies here. If you name a column "wombat_walnut_crackerjacks" 
instead of "w" it is going to take up more space on disk. This is because 
cassandra stores the column name and value each column on disk, because it is a 
row store, apparently.

A simple way to solve this would be to have the CQL language store some 
meta-data about alternate column names.

{quote}
Create table abc ( wombat_walnul_crackerjacks int (shortname w) );
{quote}

Then the query engine could allow either to be used in a select cause.

{quote}
SELECT w from abc;
{quote}

{quote}
SELECT wombat_walnul_crackerjacks from abc;
{quote}

An even easier way is to name the column "w". This way you avoid having systems 
where column needs two names, or systems where column names have a internal 
database of column name->shorter column name. But what is the fun of just 
telling people to use short names when a complex solution can be engineered :)

> Reduce memory, disk space, and cpu usage with a column name/id map
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-4175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>              Labels: performance
>             Fix For: 3.0
>
>
> We spend a lot of memory on column names, both transiently (during reads) and 
> more permanently (in the row cache).  Compression mitigates this on disk but 
> not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too 
> via very high allocation rates in the young generation, hence more GC 
> activity.
> Now that CQL3 provides us some guarantees that column names must be defined 
> before they are inserted, we could create a map of (say) 32-bit int column 
> id, to names, and use that internally right up until we return a resultset to 
> the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-4175) Reduce memory, disk space, and cpu usage with a column name/id map

Reply via email to