Symbolizing column names for storage and cache efficiency

Evan Weaver Sun, 26 Jul 2009 00:29:21 -0700

This article http://bit.ly/FJgTE about MongoDB is interesting. They
prioritized low barriers to entry in their selection process, and
ignored performance/scaling of any kind.


Aside from that, they mention that for row-oriented storage,
serializing the same string column names to disk for every row is a
big waste of disk and cache space. As far as I know, this affects
Cassandra too.

Would it be possible to add symbolized column names in a
forward-compatible way? Maybe scoped per sstable, with the registries
always kept in memory. Each node could individually make a decision
about whether a column name is duplicated enough to be worth
symbolizing, and apply the transformation in the compaction phase.

Of course there are pitfalls, but it seems like it could be a big boon
to effective cache size in row-oriented applications.

Evan

-- 
Evan Weaver

Symbolizing column names for storage and cache efficiency

Reply via email to