On Wed, Oct 22, 2014 at 4:34 AM, Thomas Whiteway < thomas.white...@metaswitch.com> wrote:
> I’m working on an application using a Cassandra (2.1.0) cluster where > > - our entire dataset is around 22GB > > - each node has 48GB of memory but only a single (mechanical) > hard disk > > - in normal operation we have a low level of writes and no reads > > - very occasionally we need to read rows very fast (>1.5K > rows/second), and only read each row once. > > > > When we try and read the rows it takes up to five minutes before Cassandra > is able to keep up. The problem seems to be that it takes a while to get > the data into the page cache and until then Cassandra can’t retrieve the > data from disk fast enough (e.g. if I drop the page cache mid-test then > Cassandra slows down for the next 5 minutes). > Use : populate_io_cache_on_flush It's designed for this case. "flush" in this case also includes the "flush" that comes at the end of compaction. Kevin Burton's (hi! :D) https://code.google.com/p/linux-ftools/ will help you keep the SSTables in the page cache when f/e rebooting nodes. =Rob