Re: Latency Improvement Report for ShardedOpWQ

Dong Yuan Sun, 28 Sep 2014 00:20:06 -0700

Hi Somnath,

I totally agree with you.


I read the code about  sharded TP and the new OSD OpWQ. In the new
implementation, there is not  single lock for all PGs, but each lock
for a subset of PGs(Am I right?).   It is very useful to reduce lock
contention and so increase parallelism. It is an awesome work!

While I am working on the latency of single IO (mainly 4K random
write), I notice the OpWQ spent about 100+us to transfer an IO from
msg dispatcher to OpWQ worker thread, Do you have any idea to reduce
the time span?

Thanks for your help.
Dong.

On 28 September 2014 13:46, Somnath Roy <[email protected]> wrote:
> Hi Dong,
> I don't think in case of single client scenario there is much benefit. Single 
> client has a limitation. The benefit with sharded TP is, a single OSD is 
> scaling much more with the increase of clients since it is increasing 
> parallelism (by reducing lock contention) in the filestore level. A quick 
> check could be like this.
>
> 1. Create a single node, single OSD cluster and try putting load with 
> increasing number of clients like 1,3, 5, 8,10. Small workload serving from 
> memory should be ideal.
> 2. Compare the code with sharded TP against say firefly. You should be seeing 
> firefly is not scaling with increasing number of clients.
> 3. try top -H on two different case and you should be seeing more threads in 
> case of sharded tp were working in parallel than firefly.
>
> Also, I am sure this latency result will not hold true in high workload , 
> there you should be seeing more contention and as a result more latency.
>
> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Dong Yuan
> Sent: Saturday, September 27, 2014 8:45 PM
> To: ceph-devel
> Subject: Latency Improvement Report for ShardedOpWQ
>
> ===== Test Purpose =====
>
> Measure whether and how much Sharded OpWQ is better than Traditional OpWQ for 
> random write scene.
>
> ===== Test Case =====
>
> 4K Object WriteFull for 1w times.
>
> ===== Test Method =====
>
> Put the following static probes into codes when running tests to get the time 
> span between enqeueue and dequeue of OpWQ.
>
> Start: PG::enqueue_op before osd->op_wq.equeue call
> End: OSD::dequeue_op.entry
>
> ===== Test Result =====
>
> Traditional OpWQ: 109us(AVG), 40us(MIN)
> ShardedOpWQ: 97us(AVG), 32us(MIN)
>
> ===== Test Conclusion =====
>
> No Remarkably Improvement for Latency
>
>
> --
> Dong Yuan
> Email:[email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
> body of a message to [email protected] More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
>



-- 
Dong Yuan
Email:[email protected]
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Latency Improvement Report for ShardedOpWQ

Reply via email to