Did you do any of that testing to involve a degraded cluster, backfilling, peering, etc? A healthy cluster running normally uses sometimes 4x less memory and CPU resources as a cluster consistently peering and degraded.
On Sat, Aug 12, 2017, 2:40 PM Nick Fisk <[email protected]> wrote: > I was under the impression the memory requirements for Bluestore would be > around 2-3GB per OSD regardless of capacity. > > CPU wise, I would lean towards working out how much total Ghz you require > and then get whatever CPU you need to get there, but with a preference of > Ghz over cores. Yes, there will be a slight overhead to having more threads > running on a lower number of cores, but I believe this is fairly minimal in > comparison to the speed boost obtained by the single threaded portion of > the > data path in each OSD from running on a faster Ghz core. Each PG takes a > lock for each operation and so any other requests for the same PG will > queue > up and be processed sequentially. The faster you can process through this > stage the better. I'm pretty sure if you graphed PG activity on an average > cluster, you would see a high skew to a certain number of PG's being hit > more often than others. I think Mark N has been experiencing the effects of > the PG locking in recent tests. > > Also don't forget to make sure your CPUs are running at c-state C1 and max > Freq. This can sometimes give up to a 4x reduction in latency. > > Also, if you look at the number of threads running on a OSD node, it will > be > in the 10's of 100's of threads, each OSD process itself has several > threads. So don't think that 12 OSD's=12 core processor. > > I did some tests to measure cpu usage per IO, which you may find useful. > > http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/ > > I can max out 12x7.2k disks on a E3 1240 CPU and its only running at about > 15-20%. > > I haven't done any proper Bluestore tests, but from some rough testing the > CPU usage wasn't too dissimilar from Filestore. > > Depending on if you are running hdd's or ssd's and how many per node. I > would possibly look at the single socket E3's or E5's. > > Although saying that, the recent AMD and Intel announcements also have some > potentially interesting single socket Ceph potentials in the mix. > > Hope that helps. > > Nick > > > -----Original Message----- > > From: ceph-users [mailto:[email protected]] On Behalf > > Of Stijn De Weirdt > > Sent: 12 August 2017 14:41 > > To: David Turner <[email protected]>; [email protected] > > Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements > > > > hi david, > > > > sure i understand that. but how bad does it get when you oversubscribe > > OSDs? if context switching itself is dominant, then using HT should > > allow to run double the amount of OSDs on same CPU (on OSD per HT > > core); but if the issue is actual cpu cycles, HT won't help that much > > either (1 OSD per HT core vs 2 OSD per phys core). > > > > i guess the reason for this is that OSD processes have lots of threads? > > > > maybe i can run some tests on a ceph test cluster myself ;) > > > > stijn > > > > > > On 08/12/2017 03:13 PM, David Turner wrote: > > > The reason for an entire core peer osd is that it's trying to avoid > > > context switching your CPU to death. If you have a quad-core > > > processor with HT, I wouldn't recommend more than 8 osds on the box. > > > I probably would go with 7 myself to keep one core available for > > > system operations. This recommendation has nothing to do with GHz. > > > Higher GHz per core will likely improve your cluster latency. Of > > > course if your use case says that you only need very minimal > > > through-put... There is no need to hit or exceed the recommendation. > > > The number of cores recommendation is not changing for bluestore. It > > > might add a recommendation of how fast your processor should be... > > > But making it based on how much GHz per TB is an invitation to context > switch to death. > > > > > > On Sat, Aug 12, 2017, 8:40 AM Stijn De Weirdt > > > <[email protected]> > > > wrote: > > > > > >> hi all, > > >> > > >> thanks for all the feedback. it's clear we should stick to the > > >> 1GB/TB for the memory. > > >> > > >> any (changes to) recommendation for the CPU? in particular, is it > > >> still the rather vague "1 HT core per OSD" (or was it "1 1Ghz HT > > >> core per OSD"? it would be nice if we had some numbers like > > >> required specint per TB and/or per Gbs. also any indication how > > >> much more cpu EC uses (10%, 100%, ...)? > > >> > > >> i'm aware that this also depeneds on the use case, but i'll take > > >> any pointers i can get. we will probably end up overprovisioning, > > >> but it would be nice if we can avoid a whole cpu (32GB dimms are > > >> cheap, so lots of ram with single socket is really possible these > days). > > >> > > >> stijn > > >> > > >> On 08/10/2017 05:30 PM, Gregory Farnum wrote: > > >>> This has been discussed a lot in the performance meetings so I've > > >>> added Mark to discuss. My naive recollection is that the > > >>> per-terabyte recommendation will be more realistic than it was in > > >>> the past (an effective increase in memory needs), but also that it > > >>> will be under much better control than previously. > > >>> > > >>> On Thu, Aug 10, 2017 at 1:35 AM Stijn De Weirdt > > >>> <[email protected] > > >>> > > >>> wrote: > > >>> > > >>>> hi all, > > >>>> > > >>>> we are planning to purchse new OSD hardware, and we are wondering > > >>>> if for upcoming luminous with bluestore OSDs, anything wrt the > > >>>> hardware recommendations from > > >>>> http://docs.ceph.com/docs/master/start/hardware-recommendations/ > > >>>> will be different, esp the memory/cpu part. i understand from > > >>>> colleagues that the async messenger makes a big difference in > > >>>> memory usage (maybe also cpu load?); but we are also interested > > >>>> in > > the "1GB of RAM per TB" > > >>>> recommendation/requirement. > > >>>> > > >>>> many thanks, > > >>>> > > >>>> stijn > > >>>> _______________________________________________ > > >>>> ceph-users mailing list > > >>>> [email protected] > > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>> > > >>> > > >> _______________________________________________ > > >> ceph-users mailing list > > >> [email protected] > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> > > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
