To add on to Mark's thoughtful reply - The formula was intended to be used
on a *per-pool* basis for clusters that have a small number of pools.
However in small or large clusters, you may consider scaling up or down per
Mark's suggestion, or using a fixed amount per pool to keep the numbers
(and resource consumption) from growing out of hand with the addition of
numerous pools.

As Mark mentioned - a small cluster can safely get "some extra", especially
if you plan on growing the cluster. Increasing the number of PGs can be an
intensive operation, so planning for the future can be beneficial. As you
might see there are many factors to keep in mind when selecting an optimal
number.

In the end, your mileage may vary and scale tests or models should be
considered.

~Brian


On Fri, Mar 14, 2014 at 9:18 AM, Mark Nelson <[email protected]>wrote:

> My personal opinion on this (not necessarily the official Inktank
> position) is that I'd rather error on the side of too many PGs for small
> clusters while I would probably prefer to error on the side of fewer
> (though not insanely so) PGs for larger clusters.
>
> IE I suspect that the difference between 2048 and 4096 PGs on a small
> cluster isn't going to be a huge deal, but going from 131072 to 262144 PGs
> on a larger cluster may have bigger effects, especially on the mons.
>
> There are things to consider here that go beyond just monitor workload and
> data distribution though.  A big one is how many objects you expect to have
> vs the number of PGs and what the per PG directory splitting thresholds are
> set to.  The more PGs you have, the more total objects you can place before
> directories get split (at the same split thresholds).  Whether or not you
> are better off with more PGs or higher split thresholds at high object
> counts isn't totally clear yet, especially when factoring in
> backfill/recovery.  These are things we are actively thinking about.
>
> Mark
>
>
> On 03/14/2014 10:18 AM, Karol Kozubal wrote:
>
>> Dan, I think your interpretation is indeed correct.
>>
>> The documentation on this page looks to be saying this.
>> http://ceph.com/docs/master/rados/operations/placement-groups/
>>
>> Increasing the number of placement groups reduces the variance in
>> per-OSD load across your cluster. We recommend approximately 50-100
>> placement groups per OSD to balance out memory and CPU requirements and
>> per-OSD load. For a single pool of objects, you can use the following
>> formula:
>>
>> Then lower on the same page...
>>
>> When using multiple data pools for storing objects, you need to ensure
>> that you balance the number of placement groups per pool with the number
>> of placement groups per OSD so that you arrive at a reasonable total
>> number of placement groups that provides reasonably low variance per OSD
>> without taxing system resources or making the peering process too slow.
>>
>> However a confirmation from InkTank would be nice.
>>
>> Karol
>>
>>
>> From: Dan Van Der Ster <[email protected]
>> <mailto:[email protected]>>
>>
>> Date: Friday, March 14, 2014 at 10:55 AM
>> To: "[email protected] <mailto:[email protected]>"
>> <[email protected] <mailto:[email protected]>>
>> Cc: "[email protected] <mailto:[email protected]>"
>> <[email protected] <mailto:[email protected]>>
>>
>> Subject: Re: [ceph-users] PG Calculations
>>
>> Hi,
>> Since you didn't get an immediate reply from a developer, I'm going to
>> be bold and repeat my interpretation that the documentation implies,
>> perhaps not clearly enough, that the 50-100 PGs per OSD rule should be
>> applied for the total of all pools, not per pool. I hope a dev will
>> correct me if I'm wrong.
>>
>> With your config you must have an avg 400 PGs per OSD. Do you find
>> peering/backfilling/recovery to be responsive? How is the CPU and memory
>> usage of your OSDs during backfilling?
>>
>> Cheers, Dan
>>
>> -- Dan van der Ster || Data & Storage Services || CERN IT Department --
>>
>>
>>
>> -------- Original Message --------
>> From: "McNamara, Bradley" <[email protected]
>> <mailto:[email protected]>>
>> Sent: Thursday, March 13, 2014 08:03 PM
>> To: [email protected] <mailto:[email protected]>
>> Subject: [ceph-users] PG Calculations
>>
>> There was a very recent thread discussing PG calculations, and it made
>> me doubt my cluster setup.  So, Inktank, please provide some
>> clarification.
>>
>> I followed the documentation, and interpreted that documentation to mean
>> that PG and PGP calculation was based upon a per-pool calculation.  The
>> recent discussion introduced a slightly different formula adding in the
>> total number of pools:
>>
>> # OSD * 100 / 3
>>
>> vs.
>>
>> # OSD's * 100 / (3 * # pools)
>>
>> My current cluster has 24 OSD's, replica size of 3, and the standard
>> three pools, RBD, DATA, and METADATA.  My current total PG's is 3072,
>> which by the second formula is way too many.  So, do I have too many?
>> Does it need to be addressed, or can it wait until I add more OSD's,
>> which will bring the ratio closer to ideal?  I'm currently using only
>> RBD and CephFS, no RadosGW.
>>
>> Thank you!
>>
>> Brad
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to