column names are stored per cell (moving to user@)
On Mon, Aug 30, 2010 at 6:58 AM, Terje Marthinussen <tmarthinus...@gmail.com> wrote: > Hi, > > Was just looking at a SSTable file after loading a dataset. The data load > has no updates of data but: > - Columns can in some rare cases be added to existing super columns > - SuperColumns will be added to the same key (but not overwriting existing > data). I batch these, but it is quite likely that there will be 2-3 updates > to a key. > > This is a random selected SSTable file from a much bigger dataset. > > The data is stored as date(super)/type(column)/value > Date is a simple "20100811" type string. > Value is a small integer, 2 digit on average > > If I run a simple strings on the SSTable and look for the data: > value: 692Kbyte of data > type: 4.01MByte of data > date: 4.6MB of data > > In total: 9.4MByte > > The size of the .db file however, is 36.4MB... > > The expansion from the column headers are bad enough, but I can somehow > accept that. > The almost 4x expansion on top of that is a bit harder to justify... > > Anyone know already where this expansion comes from? Or I need to take a > careful look at source (probably useful anyway :)) > > Terje > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com