I think this is great since when we trying to optimize WAL, we set the
write_buffer and memtable very aggressive, which will case read amplification.
I was worring about it but now we can have separate column family :
Write optimized for big stuff(WAL, and overlay)-----trying to minimized
the WA
Read optimized for small stuff(onode,omap)--------trying to minimize
the RA.
And also, we can have different cache policy here, which help us prevent WAL
and overlay flush the ondes out of the cache.
For WAL, NO CACHE
For onode, MAX_CACHE
For overlay, medium? No?
But since the transaction TPS is the main bottleneck now , maybe we can delay
this a bit while?
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Sage Weil
Sent: Wednesday, April 22, 2015 4:56 AM
To: [email protected]
Subject: newstore and rocksdb column families
Dhruba (rocksdb dev) asked if column families might be a good fit for
controlling the WAL behavior. I'm not certain it addresses specifically the
WAL behavior, but it creates a bunch of opportunities for segregating the
overlay and/or wal records out from the regular metadata (onodes, omap). The
short version is that each column family has it's own memtable and sstable
files, but everything shares the same WAL, so you still get the atomicity.
I suspect this would be most helpful for the overlay records, where we'll have
reasonably large key/value pairs with medium to long lifespans. I'm not sure
how helpful it will be with our wal records since if they make it out of the
log at all we are already losing. :/
Anyway, something to consider!
https://github.com/facebook/rocksdb/wiki/Column-Families
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the
body of a message to [email protected] More majordomo info at
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html