[
https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000516#comment-13000516
]
T Jake Luciani commented on CASSANDRA-1902:
-------------------------------------------
Attached new version based on previous feedback.
This takes a lazy approach and migrates pages per column. So it will handle
active column slices etc.
The page cache info of a sstable is now stored in a OpenBitSet so a 1TB SSTable
will fit in ~23Mb assuming a 4kb page size.
Here are the stats with a 1470 + 1902 flag off and on:
{code}
#write 100k wide rows to CF1
python stress.py -S 3000 -n 100000 -k
(off) 0.7.3: 81
(on) 0.7.3: 93
#flush the keyspace wait for compaction to finish
#read from CF1 till in page cache
python stress.py -n 100000 -k -o read
(off) 0.7.3: 73, 13
(on) 0.7.3: 65, 13
#write 100k wide rows to CF2
python stress.py -S 3000 -n 100000 -k -y super
(off) 0.7.3: 79
(on) 0.7.3: 99
#wait for compaction to finish
#read from CF1 till in page cache
python stress.py -n 100000 -k -o read
(off) 0.7.3: 227, 36, 14
(on) 0.7.3: 97, 31, 14
#overwrite 100k wide rows to CF1
python stress.py -S 3000 -n 100000 -k
(off) 0.7.3: 100
(on) 0.7.3: 123
#perform major compaction on CF1
#this will test cache migration
#read from CF1 till in page cache
python stress.py -n 100000 -k -o read
(off) 0.7.3: 148
(on) 0.7.3: 97
{code}
> Migrate cached pages during compaction
> ---------------------------------------
>
> Key: CASSANDRA-1902
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1902
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.7.1
> Reporter: T Jake Luciani
> Assignee: T Jake Luciani
> Fix For: 0.7.4
>
> Attachments:
> 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt,
> 1902-formatted.txt, 1902-per-column-migration.txt
>
> Original Estimate: 32h
> Time Spent: 24h
> Remaining Estimate: 8h
>
> Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a
> pre-compacted CF during the compaction process.
> First, add a method to MmappedSegmentFile: long[] pagesInPageCache() that
> uses the posix mincore() function to detect the offsets of pages for this
> file currently in page cache.
> Then add getActiveKeys() which uses underlying pagesInPageCache() to get the
> keys actually in the page cache.
> use getActiveKeys() to detect which SSTables being compacted are in the os
> cache and make sure the subsequent pages in the new compacted SSTable are
> kept in the page cache for these keys. This will minimize the impact of
> compacting a "hot" SSTable.
> A simpler yet similar approach is described here:
> http://insights.oetiker.ch/linux/fadvise/
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira