Hi Sage and Mark,
Chendi from our team had done the test based on v0.91. The setup is
4 nodes, totally 40HDDs with SSDs as journal, replica=2
Mount a partition from journal SSD to /current/omap benefit 4K
random write IOPS(peak) from 1524 to 2694, that's 76% while other IO patterns
keep the same.
Some details are here.
If this can reproduce in other setup, I suspect it worth us to
investigate some time to do the detection.
Runid OP_SIZE OP_TYPE QD Engine server_num
client_num rbd_num RBD_FIO_IOPS RBD_FIO_BW RBD_FIO_Latency
osd_read_iops osd_write_iops osd_read_bw
Prev 305 4k randwrite qd8 vdb 4 2
40 1524 6170.1 209.3851 7.862196
7677.648 0.446566 54.916435
Omap2ssd 320 4k randwrite qd8 vdb 4 2
40 2694 10864.23 119.4587 322.4334 10930
1.409266 71.33833
Xiaoxi
-----Original Message-----
From: Mark Nelson [mailto:[email protected]]
Sent: Wednesday, April 22, 2015 7:59 AM
To: Sage Weil; Chen, Xiaoxi
Cc: Haomai Wang; Somnath Roy; Duan, Jiangang; Zhang, Jian; ceph-devel
Subject: Re: 回复: Re: 回复: Re: 回复: Re: NewStore performance analysis
On 04/21/2015 06:57 PM, Sage Weil wrote:
> On Tue, 21 Apr 2015, Chen, Xiaoxi wrote:
>> ---- Sage Weil?? ----
>>
>>> On Tue, 21 Apr 2015, Chen, Xiaoxi wrote:
>>>> Haomai is right in theory, but I am not sure whether all
>>>> user(mon,filestore,kvstore) of submit_transaction API clearly
>>>> holding the expectation that their data is not persistent and may
>>>> lost in failure. So in rocksdb now the sync is default to true
>>>> even in submit_transaction(and this option make the two api exactly the
>>>> same).
>>>> Maybe we need to rename the api to
>>>> submit_transaction_persistent/nonpersistent to better discribe the
>>>> behavior?
>>>
>>> Let's audit them, then.. I think they are right, but we may as well
>>> confirm!
>>>
>>> Again, FileStore is the odd one out here because it is relying on
>>> the
>>> syncfs(2) at commit time for everything.
>>>
>>
>> Yes, so maybe we dont need to expose the option to user, we can
>> decide whether to.sync in code logic.
>
> Yeah, I think it'll reduce confusion too. I suggest we do a pull
> request against master that does this... let me know if you want to do
> it, otherwise I will!
>
>> I remember some folks in out team tried to move KVDB to a partition
>> on SSD while leave other filestore data on HDD, in my memory it
>> benifit performance. This deployment is problematic with
>> kv_sync=false. gWill check the data first and then we can evaluate
>> whethe we want to support this kind of deployment.
>
> We could detect this by doing a stat(2) on the current/omap/ vs
> current/ dirs and checking if it's a different file system. If so, we
> can do the
> syncfs(2) on both dirs. The btrfs case would probably not be
> practical, but we can error out in that case. But yeah not sure how
> important it would be to support this since filestore doesn't use
> leveldb that heavily... and I'd prefer to limit our investment of time
> there if we can instead make newstore (or something else) better.
FWIW, the last time I tried putting leveldb on SSD didn't really help at all.
It's been a while so maybe that's changed, but newstore definitely seems like
the way forward to me.
Mark