[ https://issues.apache.org/jira/browse/CASSANDRA-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000516#comment-13000516 ]
T Jake Luciani commented on CASSANDRA-1902: ------------------------------------------- Attached new version based on previous feedback. This takes a lazy approach and migrates pages per column. So it will handle active column slices etc. The page cache info of a sstable is now stored in a OpenBitSet so a 1TB SSTable will fit in ~23Mb assuming a 4kb page size. Here are the stats with a 1470 + 1902 flag off and on: {code} #write 100k wide rows to CF1 python stress.py -S 3000 -n 100000 -k (off) 0.7.3: 81 (on) 0.7.3: 93 #flush the keyspace wait for compaction to finish #read from CF1 till in page cache python stress.py -n 100000 -k -o read (off) 0.7.3: 73, 13 (on) 0.7.3: 65, 13 #write 100k wide rows to CF2 python stress.py -S 3000 -n 100000 -k -y super (off) 0.7.3: 79 (on) 0.7.3: 99 #wait for compaction to finish #read from CF1 till in page cache python stress.py -n 100000 -k -o read (off) 0.7.3: 227, 36, 14 (on) 0.7.3: 97, 31, 14 #overwrite 100k wide rows to CF1 python stress.py -S 3000 -n 100000 -k (off) 0.7.3: 100 (on) 0.7.3: 123 #perform major compaction on CF1 #this will test cache migration #read from CF1 till in page cache python stress.py -n 100000 -k -o read (off) 0.7.3: 148 (on) 0.7.3: 97 {code} > Migrate cached pages during compaction > --------------------------------------- > > Key: CASSANDRA-1902 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1902 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.7.1 > Reporter: T Jake Luciani > Assignee: T Jake Luciani > Fix For: 0.7.4 > > Attachments: > 0001-CASSANDRA-1902-cache-migration-impl-with-config-option.txt, > 1902-formatted.txt, 1902-per-column-migration.txt > > Original Estimate: 32h > Time Spent: 24h > Remaining Estimate: 8h > > Post CASSANDRA-1470 there is an opportunity to migrate cached pages from a > pre-compacted CF during the compaction process. > First, add a method to MmappedSegmentFile: long[] pagesInPageCache() that > uses the posix mincore() function to detect the offsets of pages for this > file currently in page cache. > Then add getActiveKeys() which uses underlying pagesInPageCache() to get the > keys actually in the page cache. > use getActiveKeys() to detect which SSTables being compacted are in the os > cache and make sure the subsequent pages in the new compacted SSTable are > kept in the page cache for these keys. This will minimize the impact of > compacting a "hot" SSTable. > A simpler yet similar approach is described here: > http://insights.oetiker.ch/linux/fadvise/ -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira