[ https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222249#comment-14222249 ]
Edward Capriolo commented on CASSANDRA-4175: -------------------------------------------- There was once a https://twitter.com/roflscaletips suggestion that said something to the effect of "make mongo faster by using small column names". The same advice applies here. If you name a column "wombat_walnut_crackerjacks" instead of "w" it is going to take up more space on disk. This is because cassandra stores the column name and value each column on disk, because it is a row store, apparently. A simple way to solve this would be to have the CQL language store some meta-data about alternate column names. {quote} Create table abc ( wombat_walnul_crackerjacks int (shortname w) ); {quote} Then the query engine could allow either to be used in a select cause. {quote} SELECT w from abc; {quote} {quote} SELECT wombat_walnul_crackerjacks from abc; {quote} An even easier way is to name the column "w". This way you avoid having systems where column needs two names, or systems where column names have a internal database of column name->shorter column name. But what is the fun of just telling people to use short names when a complex solution can be engineered :) > Reduce memory, disk space, and cpu usage with a column name/id map > ------------------------------------------------------------------ > > Key: CASSANDRA-4175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4175 > Project: Cassandra > Issue Type: Improvement > Reporter: Jonathan Ellis > Assignee: Jason Brown > Labels: performance > Fix For: 3.0 > > > We spend a lot of memory on column names, both transiently (during reads) and > more permanently (in the row cache). Compression mitigates this on disk but > not on the heap. > The overhead is significant for typical small column values, e.g., ints. > Even though we intern once we get to the memtable, this affects writes too > via very high allocation rates in the young generation, hence more GC > activity. > Now that CQL3 provides us some guarantees that column names must be defined > before they are inserted, we could create a map of (say) 32-bit int column > id, to names, and use that internally right up until we return a resultset to > the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)