[
https://issues.apache.org/jira/browse/CASSANDRA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932748#comment-13932748
]
Edward Capriolo commented on CASSANDRA-1657:
--------------------------------------------
{quote}
"yet you most definitely want all the things that Cassandra offers in terms of
replication, consistency, durability etc."
"In order to semi-deterministically ensure acceptable performance for such
data, Cassandra could support in-memory column families. Such an in-memory
column family would imply that mlock() be used on sstables for this column
family. On start-up and on compaction completion, they could be mmap():ed with
MAP_POPULATE (Linux specific) or else just mmap():ed + mlock():ed in such a way
as to otherwise guarantee it is in-memory (such as userland traversal of the
entire file)."
{quote}
I totally understand this prospective of letting cassandra operate as it is
currently doing and simply keep the data on a ram disk or locked in memory,
however this seems waistful to me in terms of having to use more memory then
physical data.
I wonder if AOF from redis fits our needs well. It is durable
http://redis.io/topics/persistence
{quote}
AOF advantages
Using AOF Redis is much more durable: you can have different fsync
policies: no fsync at all, fsync every second, fsync at every query. With the
default policy of fsync every second write performances are still great (fsync
is performed using a background thread and the main thread will try hard to
perform writes when no fsync is in progress.) but you can only lose one second
worth of writes.
The AOF log is an append only log, so there are no seeks, nor corruption
problems if there is a power outage. Even if the log ends with an half-written
command for some reason (disk full or other reasons) the redis-check-aof tool
is able to fix it easily.
Redis is able to automatically rewrite the AOF in background when it gets
too big. The rewrite is completely safe as while Redis continues appending to
the old file, a completely new one is produced with the minimal set of
operations needed to create the current data set, and once this second file is
ready Redis switches the two and starts appending to the new one.
AOF contains a log of all the operations one after the other in an easy to
understand and parse format. You can even easily export an AOF file. For
instance even if you flushed everything for an error using a FLUSHALL command,
if no rewrite of the log was performed in the meantime you can still save your
data set just stopping the server, removing the latest command, and restarting
Redis again.
{quote}
I feel like we may want to make these parameters to the storage engine. If
people want to play fast and loose they can tune off the durability.
> support in-memory column families
> ---------------------------------
>
> Key: CASSANDRA-1657
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1657
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Peter Schuller
> Assignee: Edward Capriolo
> Priority: Minor
>
> Some workloads are such that you absolutely depend on column families being
> in-memory for performance, yet you most definitely want all the things that
> Cassandra offers in terms of replication, consistency, durability etc.
> In order to semi-deterministically ensure acceptable performance for such
> data, Cassandra could support in-memory column families. Such an in-memory
> column family would imply that mlock() be used on sstables for this column
> family. On start-up and on compaction completion, they could be mmap():ed
> with MAP_POPULATE (Linux specific) or else just mmap():ed + mlock():ed in
> such a way as to otherwise guarantee it is in-memory (such as userland
> traversal of the entire file).
--
This message was sent by Atlassian JIRA
(v6.2#6252)