RE: FW: Weekly Ceph Performance Meeting Invitation

Somnath Roy Fri, 03 Oct 2014 15:19:23 -0700

That's great Haomai, looking forward to this pull request.

Thanks & Regards
Somnath


-----Original Message-----
From: Haomai Wang [mailto:[email protected]] 
Sent: Thursday, October 02, 2014 10:28 PM
To: Somnath Roy
Cc: ceph-devel
Subject: Re: FW: Weekly Ceph Performance Meeting Invitation

On Fri, Oct 3, 2014 at 1:39 AM, Somnath Roy <[email protected]> wrote:
> Please share your opinion on this..
>
> -----Original Message-----
> From: Sage Weil [mailto:[email protected]]
> Sent: Wednesday, October 01, 2014 3:57 PM
> To: Somnath Roy
> Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
> Subject: RE: Weekly Ceph Performance Meeting Invitation
>
> On Wed, 1 Oct 2014, Somnath Roy wrote:
>> Yes Sage, it's all read..Each call to lfn_open() will incur this 
>> lookup in case of FDCache miss (which will be in 99% of cases).
>> The following patch will certainly help the write path (which is
>> exciting!)  but not read as read is not through the transaction path.
>> My understanding is in the read path per io only two calls are going 
>> to filestore , one xattr ("_") and followed by read to the same 
>> object. If somehow, we can club (or something) this two requests, 
>> reads will be benefitted. I did some prototype earlier by passing the 
>> fd (and path) to the replicated pg during getattr call and pass the 
>> same fd/path during next read. This improving performance as well as 
>> cpu usage. But, this is against the objectstore interface logic :-( 
>> Basically, sole purpose of FDCache for serving this kind of scenario 
>> but since it is sharded based on object hash now (and FDCache itself 
>> is cpu
>> intensive) it is not helping much. May be sharding based on PG
>> (Col_id) could help here ?
>
> I suspect a more fruitful approach would be to make a read-side handle-based 
> API for objectstore... so you can 'open' an object, keep that handle to the 
> ObjectContext, and then do subsequent read operations against that.
>
> Sharding the FDCache per PG would help with lock contention, yes, but is that 
> the limiter or are we burning CPU?
>
>> Also, I don't think ceph io path is very memory intensive and we can 
>> leverage some memory for cache usage. For example, if we can have a 
>> object_context cache at Replicated PG level (now the cache is there 
>> but the contexts are not persisted), the performance (and cpu 
>> usage)will be improved dramatically. I know that there can be lot of 
>> PGs and thus memory usage can be a challenge. But, certainly we can 
>> control that by limiting per cache size and what not. What could be 
>> the size of an object_context instance, shouldn't be much I guess. I 
>> did some prototyping on that too and got significant improvement. 
>> This will eliminate the getattr path in case of cache hit.
>
> Can you propose this on ceph-devel?  I think this is promising.  And probably 
> quite easy to implement.

Yes, we have done this impl and it can help reduce nearly 100us for each IO if 
hit cache. We will make a pull request next week. :-)

>
>> Another challenge for read ( and for write too probably)  is the 
>> sequential io in case of rbd . With the Linux default read_ahead , 
>> performance of sequential read is significantly less than random read 
>> with latest code in case of io_size say 64K. The obvious reason is 
>> that with rbd, the default object size being 4MB, lot of sequential 
>> 64K reads are coming to same PG and getting bottlenecked there.
>> Increasing read_ahead size improving performance but that will have 
>> an effect in random workload. I think PG level cache should help here.
>> Striped images from librbd will not be facing this problem I guess 
>> but krbd is not supporting striping and it is definitely a problem there.
>
> I still think the key here is a comprehensive set of IO hints.  Then it's a 
> problem of making sure we are using them effectively...
>
>> We can discuss these in next meeting if this sounds interesting.
>
> Yeah, but let's discuss on list first, no reason to wait!
>
> s
>
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Sage Weil [mailto:[email protected]]
>> Sent: Wednesday, October 01, 2014 1:14 PM
>> To: Somnath Roy
>> Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz
>> Subject: RE: Weekly Ceph Performance Meeting Invitation
>>
>> On Wed, 1 Oct 2014, Somnath Roy wrote:
>> > CPU wise the following are still hurting us in Giant. Lot of fixes 
>> > like IndexManager stuff went in Giant that helped cpu consumption 
>> > wise as well.
>> >
>> > 1. LFNIndex lookup logic . I have a fix that will save around one 
>> > cpu core on that path. I am yet to address comments made by 
>> > Greg/Sam on that. But, lot of improvement can happen here.
>>
>> Have you looked at
>>
>>
>> https://github.com/ceph/ceph/commit/74b1cf8bf1a7a160e6ce14603df63a46b
>> 2
>> 2d8b98
>>
>> The patch is incomplete, but with that change we should be able to drop to a 
>> single path lookup per ObjectStore::Transaction (as opposed to one for each 
>> op in the transaction that touches the given object).  I'm not sure if you 
>> were looking at ops that had a lot of those or they were simple single-io 
>> type operations?  That would only help on the write path; I think you said 
>> you've been focusing in reads.
>>
>> > 2. Buffer class is very cpu intensive. Fixing that part will be 
>> > helping every ceph components.
>>
>> +1
>>
>> sage
>>
>> ________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is 
>> intended only for the use of the designated recipient(s) named above. If the 
>> reader of this message is not the intended recipient, you are hereby 
>> notified that you have received this message in error and that any review, 
>> dissemination, distribution, or copying of this message is strictly 
>> prohibited. If you have received this communication in error, please notify 
>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>> any and all copies of this message in your possession (whether hard copies 
>> or electronically stored copies).
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to [email protected] More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html



--
Best Regards,

Wheat

RE: FW: Weekly Ceph Performance Meeting Invitation

Reply via email to