On Fri, Oct 3, 2014 at 1:39 AM, Somnath Roy <[email protected]> wrote: > Please share your opinion on this.. > > -----Original Message----- > From: Sage Weil [mailto:[email protected]] > Sent: Wednesday, October 01, 2014 3:57 PM > To: Somnath Roy > Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz > Subject: RE: Weekly Ceph Performance Meeting Invitation > > On Wed, 1 Oct 2014, Somnath Roy wrote: >> Yes Sage, it's all read..Each call to lfn_open() will incur this >> lookup in case of FDCache miss (which will be in 99% of cases). >> The following patch will certainly help the write path (which is >> exciting!) but not read as read is not through the transaction path. >> My understanding is in the read path per io only two calls are going >> to filestore , one xattr ("_") and followed by read to the same >> object. If somehow, we can club (or something) this two requests, >> reads will be benefitted. I did some prototype earlier by passing the >> fd (and path) to the replicated pg during getattr call and pass the >> same fd/path during next read. This improving performance as well as >> cpu usage. But, this is against the objectstore interface logic :-( >> Basically, sole purpose of FDCache for serving this kind of scenario >> but since it is sharded based on object hash now (and FDCache itself >> is cpu >> intensive) it is not helping much. May be sharding based on PG >> (Col_id) could help here ? > > I suspect a more fruitful approach would be to make a read-side handle-based > API for objectstore... so you can 'open' an object, keep that handle to the > ObjectContext, and then do subsequent read operations against that. > > Sharding the FDCache per PG would help with lock contention, yes, but is that > the limiter or are we burning CPU? > >> Also, I don't think ceph io path is very memory intensive and we can >> leverage some memory for cache usage. For example, if we can have a >> object_context cache at Replicated PG level (now the cache is there >> but the contexts are not persisted), the performance (and cpu >> usage)will be improved dramatically. I know that there can be lot of >> PGs and thus memory usage can be a challenge. But, certainly we can >> control that by limiting per cache size and what not. What could be >> the size of an object_context instance, shouldn't be much I guess. I >> did some prototyping on that too and got significant improvement. This >> will eliminate the getattr path in case of cache hit. > > Can you propose this on ceph-devel? I think this is promising. And probably > quite easy to implement.
Yes, we have done this impl and it can help reduce nearly 100us for each IO if hit cache. We will make a pull request next week. :-) > >> Another challenge for read ( and for write too probably) is the >> sequential io in case of rbd . With the Linux default read_ahead , >> performance of sequential read is significantly less than random read >> with latest code in case of io_size say 64K. The obvious reason is >> that with rbd, the default object size being 4MB, lot of sequential >> 64K reads are coming to same PG and getting bottlenecked there. >> Increasing read_ahead size improving performance but that will have an >> effect in random workload. I think PG level cache should help here. >> Striped images from librbd will not be facing this problem I guess but >> krbd is not supporting striping and it is definitely a problem there. > > I still think the key here is a comprehensive set of IO hints. Then it's a > problem of making sure we are using them effectively... > >> We can discuss these in next meeting if this sounds interesting. > > Yeah, but let's discuss on list first, no reason to wait! > > s > >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: Sage Weil [mailto:[email protected]] >> Sent: Wednesday, October 01, 2014 1:14 PM >> To: Somnath Roy >> Cc: Mark Nelson; Kasper Dieter; Andreas Bluemle; Paul Von-Stamwitz >> Subject: RE: Weekly Ceph Performance Meeting Invitation >> >> On Wed, 1 Oct 2014, Somnath Roy wrote: >> > CPU wise the following are still hurting us in Giant. Lot of fixes >> > like IndexManager stuff went in Giant that helped cpu consumption >> > wise as well. >> > >> > 1. LFNIndex lookup logic . I have a fix that will save around one >> > cpu core on that path. I am yet to address comments made by Greg/Sam >> > on that. But, lot of improvement can happen here. >> >> Have you looked at >> >> >> https://github.com/ceph/ceph/commit/74b1cf8bf1a7a160e6ce14603df63a46b2 >> 2d8b98 >> >> The patch is incomplete, but with that change we should be able to drop to a >> single path lookup per ObjectStore::Transaction (as opposed to one for each >> op in the transaction that touches the given object). I'm not sure if you >> were looking at ops that had a lot of those or they were simple single-io >> type operations? That would only help on the write path; I think you said >> you've been focusing in reads. >> >> > 2. Buffer class is very cpu intensive. Fixing that part will be >> > helping every ceph components. >> >> +1 >> >> sage >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message is >> intended only for the use of the designated recipient(s) named above. If the >> reader of this message is not the intended recipient, you are hereby >> notified that you have received this message in error and that any review, >> dissemination, distribution, or copying of this message is strictly >> prohibited. If you have received this communication in error, please notify >> the sender by telephone or e-mail (as shown above) immediately and destroy >> any and all copies of this message in your possession (whether hard copies >> or electronically stored copies). >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
