I think this is great since when we trying to optimize WAL, we set the 
write_buffer and memtable very aggressive, which will case read amplification. 
I was worring about it but now we can have separate column family :
        Write optimized for big stuff(WAL, and overlay)-----trying to minimized 
the WA
        Read optimized for small stuff(onode,omap)--------trying to minimize 
the RA.
And also, we can have different cache policy here, which help us prevent WAL 
and overlay flush the ondes out of the cache.
        For WAL,  NO CACHE
        For onode, MAX_CACHE
        For overlay, medium? No?


But since the transaction TPS is the main bottleneck now , maybe we can delay 
this a bit while? 

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Sage Weil
Sent: Wednesday, April 22, 2015 4:56 AM
To: [email protected]
Subject: newstore and rocksdb column families

Dhruba (rocksdb dev) asked if column families might be a good fit for 
controlling the WAL behavior.  I'm not certain it addresses specifically the 
WAL behavior, but it creates a bunch of opportunities for segregating the 
overlay and/or wal records out from the regular metadata (onodes, omap).  The 
short version is that each column family has it's own memtable and sstable 
files, but everything shares the same WAL, so you still get the atomicity.

I suspect this would be most helpful for the overlay records, where we'll have 
reasonably large key/value pairs with medium to long lifespans.  I'm not sure 
how helpful it will be with our wal records since if they make it out of the 
log at all we are already losing.  :/

Anyway, something to consider!

https://github.com/facebook/rocksdb/wiki/Column-Families

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
body of a message to [email protected] More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to