[ https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-4175. --------------------------------------- Resolution: Duplicate Assignee: (was: Jason Brown) Fix Version/s: (was: 3.x) Column name duplication is removed in CASSANDRA-8099. (See https://github.com/pcmanus/cassandra/blob/8099_engine_refactor/guide_8099.md.) (We can do slightly better by encoding column ids in the schema, but doing in on a per-sstable basis is almost as good from a disk space perspective.) IMO we should leave dealing with highly duplicated column *values* to the compression layer. > Reduce memory, disk space, and cpu usage with a column name/id map > ------------------------------------------------------------------ > > Key: CASSANDRA-4175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4175 > Project: Cassandra > Issue Type: Improvement > Reporter: Jonathan Ellis > Labels: performance > > We spend a lot of memory on column names, both transiently (during reads) and > more permanently (in the row cache). Compression mitigates this on disk but > not on the heap. > The overhead is significant for typical small column values, e.g., ints. > Even though we intern once we get to the memtable, this affects writes too > via very high allocation rates in the young generation, hence more GC > activity. > Now that CQL3 provides us some guarantees that column names must be defined > before they are inserted, we could create a map of (say) 32-bit int column > id, to names, and use that internally right up until we return a resultset to > the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)