[
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698701#comment-13698701
]
Terje Marthinussen edited comment on CASSANDRA-4175 at 7/3/13 7:19 AM:
-----------------------------------------------------------------------
Hi,
Sorry for the late update.
Yes, we have a cluster with some 20-30 billion columns (maybe even closer to 40
billion by now) which implements a column name map and has been in production
for about 2 years.
I was actually looking at committing this 2 years ago together with fairly
large number of other changes which was implemented in the column/supercolumn
serializer code but I never got around to implement a good way to push the
sstable version numbers into the serializer to make things backwards compatible
before focus moved resources elsewhere.
As mentioned above by others, while not benchmarked and proven, I had a very
good feeling the total change helped quite a bit on GC issues, memtables and a
bit on performance in general, but in terms of disk space, the benefit was
somewhat limited after sstable compression was implemented as the repeating
column names are compressed pretty well.
This is already 2 years ago (the cluster still runs by the way), but if memory
serves me right:
30-40% reduction in disk space without compression
10% reduction on top of compression (I did a test after it was implemented).
In my case, the implementation is actually hardcoded due to time constraints.
A static map which is global for the entire cassandra installation.
If committing this into cassandra, I believe my plan was split in 3.
Possible as 3 different implementation stages:
1. A simple config option (as a config file or as a columnfamily) where users
themselves can assign repeating column names. Sure, it is not as fancy as many
other options, but maybe we could open up to cover some strange corner case
usages here with things like substrings as well.
Think options to cover complex versions of patterns like date/times such as
20130701202020 where a large chunk of the column name repeats, but not all of
it.
In the current implementation, if there is a mapping entry, it converts the
string to a variable length integer which becomes the new column name. If there
is no mapping entry, it stores the raw data.
In our case, we have <40 repeating column names so I never need more than a 1
byte varint.
I also modified the column format to add a "column feature bitmap" at the start
of each column. This allowed me to turn on/off name/id mapping as well as
things like TTL's and a handful of other meta data.
There is a bunch of 64 bit numbers in the column format which only have default
value in 99.999% of all cases and very often your column value is just an 8
byte int, a boolean or a short text entry. That is, in most cases the column
meta data is many times larger than the value stored.
This would have been my first implementation. Mostly because I have a working
implementation of it already and the mapping table would be very easy to move
to a config file with just a list of column names read at cassandra startup, or
stored in a similar way to column family and other internal config (just as
another keyspace for config). Unfortunately, it is a little bit work also to
push such config data down to the serializer. At least as the code was
organized 2 years ago.
Notice again, you do not need any sort of atomic handling of the updates to the
map in any way in this implementation. You can add map entries at any time. The
result after deserializing is always the same as column names can have a mix of
raw and map id values thanks to the "column feature bitmap" that was introduced.
Entries that was stored as raw strings will eventually be replaced by ID's to
the map as compaction clean things up.
2. Auto learning feature with mapping table per sstable.
This would be stage 2 of the implementation.
When starting to create a new SSTable, build a sampling of the most frequently
occuring column names and gradually start mapping them to ID's.
Add the mapping table to the end of the SSTable or in a separate .map file
(similar to index files) at the completion of sstable generation.
The initial id mapping could be further improved by maintaining a global map of
column names. This "global map" would not be used for
serialization/deserialization. It would be used to pre-populate the value for a
sstable and would only be statistics to optimize things further by reducing the
number of mapping variances between sstables and reducing the number of raw
values getting stored a bit more.
The id map would still be local to each sstable in terms of storage, but having
such statistics would allow you to dramatically reduce the size of a
potentially shared id cache across sstables where a lot of mapping entries
would be identical.
Some may feel that we would run out of memory quickly or use a lot of extra
disk with maps per sstable, but I guess that we only really need to deal with
the top few thousand entries in each sstable and this would not be a problem to
keep in a idmap cache in terms of size.
This is really just the top X re-occuring column names or column name sub
pattern
If you have more unique column entries that this in a sstable, this will
probably not be the feature that will save the day anyway as the benefit per
column entry will be quite small vs. the overhead and the entire feature should
potentially disable itself automagically if there is no frequently repeating
patterns.
3. I had some ideas for moving the mapping up from the serializer to allow
things like streaming entries including id maps between nodes, but things do
indeed quickly get ugly and I do not remember clearly how I had planned to do
this.
---
The reason I isolated the mapping function to the serializer is that it looked
incredibly messy to move this further "up" in the stack. Column sorts, range
scans, lookukups...
Not fun at all and if the memtable is serialized anyway the memory consumption
there and in disk cache is dramatically reduced.
Also... with a global static map here at startup time, I actually share the
mapped strings across most columns in memory anyway as I believe they all
become pointers to my static complied in map (again, this gets a lot more
trivial to make work very well if this is a startup config, but yes a bit less
user unfriendly)
I haven't looked at the cassandra code for way to long now.
Has it become easier to get to know sstable version numbers in the serializer
class now?
I could maybe check if someone in the team here would like to take a stab at
moving this to latest cassandra and commit it if the above implementation seems
interesting.
Part of it should be really easy to port as long as we can get a bit more info
into the serializer/deserializer.
was (Author: terjem):
Hi,
Sorry for the late update.
Yes, we have a cluster with some 20-30 billion columns (maybe even closer to 40
billion by now) which implements a column name map and has been in production
for about 2 years.
I was actually looking at committing this 2 years ago together with fairly
large number of other changes which was implemented in the column/supercolumn
serializer code but I never got around to implement a good way to push the
sstable version numbers into the serializer to make things backwards compatible
before focus moved resources elsewhere.
As mentioned above by others, while not benchmarked and proven, I had a very
good feeling the total change helped quite a bit on GC issues, memtables and a
bit on performance in general, but in terms of disk space, the benefit was
somewhat limited after sstable compression was implemented as the repeating
column names are compressed pretty well.
This is already 2 years ago (the cluster still runs by the way), but if memory
serves me right:
30-40% reduction in disk space without compression
10% reduction on top of compression (I did a test after it was implemented).
In my case, the implementation is actually hardcoded due to time constraints. A
static map which is global for the entire cassandra installation.
If committing this into cassandra, I believe my plan was split in 3.
Possible as 3 different implementation stages:
1. A simple config option (as a config file or as a columnfamily) where users
themselves can assign repeating column names. Sure, it is not as fancy as many
other options, but maybe we could open up to cover some strange corner case
usages here with things like substrings as well.
Think options to cover complex versions of patterns like date/times such as
20130701202020 where a large chunk of the column name repeats, but not all of
it.
In the current implementation, if there is a mapping entry, it converts the
string to a variable length integer which becomes the new column name. If there
is no mapping entry, it stores the raw data.
In our case, we have <40 repeating column names so I never need more than 1
byte, but the implementation would handle more if I had.
I modified the sstable to add a bitmap at the start of each column to be able
to turn on/off mapping entries, timestamps not used, TTL's and other things.
There is a bunch of 64 bit numbers in the column format which only have default
value in 99.999% of all cases and very often your column value is just an 8
byte int, a boolean or a short text entry.
I think in 99% of the columns in this cassandra store, the column timestamp
takes up more space than the column value.
This would have been my first implementation. Mostly because I have a working
implementation of it already and the mapping table would be very easy to move
to a config file read at start of a column family similar to what we have for
CF config but also here, it is a bit work to push such config data down to the
serializer as the code was organized 2 years ago.
Notice again, you do not need atomic handling of the updates to the map in any
way in this implementation. You can add map entries at any time. The result
after deserializing is always the same as column names can have a mix of raw
and map id values thanks to the "column feature bitmap" that was introduced.
2. Auto learning feature with mapping table per sstable.
This would be stage 2 of the implementation.
When starting to create a new SSTable, build a sampling of the most frequently
occuring column names and gradually start mapping them to ID's.
Add the mapping table to the end of the SSTable or in a separate .map file
(similar to index files) at the completion of sstable generation.
The initial id mapping could be further improved by maintaining a global map of
column names. This "global map" would not be used for
serialization/deserialization. It would be used to pre-populate the value for a
sstable and would only be statistics to optimize things further by reducing the
number of mapping variances between sstables and reducing the number of raw
values getting stored a bit more.
The id map would still be local to each sstable in terms of storage, but having
such statistics would allow you to dramatically reduce the size of a
potentially shared id cache across sstables where a lot of mapping entries
would be identical.
Some may feel that we would run out of memory quickly or use a lot of extra
disk with maps per sstable, but I guess that we only really need to deal with
the top few thousand entries in each sstable and this would not be a problem to
keep in a idmap cache in terms of size.
This is really just the top X re-occuring column names or column name sub
pattern
If you have more unique column entries that this in a sstable, this will
probably not be the feature that will save the day anyway as the benefit per
column entry will be quite small vs. the overhead and the entire feature should
potentially disable itself automagically if there is no frequently repeating
patterns.
3. I had some ideas for moving the mapping up from the serializer to allow
things like streaming entries including id maps between nodes, but things do
indeed quickly get ugly and I do not remember clearly how I had planned to do
this.
---
The reason I isolated the mapping function to the serializer is that it looked
incredibly messy to move this further "up" in the stack. Column sorts, range
scans, lookukups...
Not fun at all and if the memtable is serialized anyway the memory consumption
there and in disk cache is dramatically reduced.
Also... with a global static map here at startup time, I actually share the
mapped strings across most columns in memory anyway as I believe they all
become pointers to my static complied in map (again, this gets a lot more
trivial to make work very well if this is a startup config, but yes a bit less
user unfriendly)
I haven't looked at the cassandra code for way to long now.
Has it become easier to get to know sstable version numbers in the serializer
class now?
I could maybe check if someone in the team here would like to take a stab at
moving this to latest cassandra and commit it if the above implementation seems
interesting.
Part of it should be really easy to port as long as we can get a bit more info
into the serializer/deserializer.
> Reduce memory, disk space, and cpu usage with a column name/id map
> ------------------------------------------------------------------
>
> Key: CASSANDRA-4175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jonathan Ellis
> Fix For: 2.1
>
>
> We spend a lot of memory on column names, both transiently (during reads) and
> more permanently (in the row cache). Compression mitigates this on disk but
> not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too
> via very high allocation rates in the young generation, hence more GC
> activity.
> Now that CQL3 provides us some guarantees that column names must be defined
> before they are inserted, we could create a map of (say) 32-bit int column
> id, to names, and use that internally right up until we return a resultset to
> the client.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira