Re: [ceph-users] WAL/DB size

Igor Fedotov Wed, 14 Aug 2019 04:16:39 -0700

Hi Wido & Hermant.

On 8/14/2019 11:36 AM, Wido den Hollander wrote:


On 8/14/19 9:33 AM, Hemant Sonawane wrote:

Hello guys,

Thank you so much for your responses really appreciate it. But I would
like to mention one more thing which I forgot in my last email is that I
am going to use this storage for openstack VM's. So still the answer
will be the same that I should use 1GB for wal?

WAL 1GB is fine, yes.


I'd like to argue against this for a bit.

Actually standalone WAL is required when you have either very small fastdevice (and don't want db to use it) or three devices (different inperformance) behind OSD (e.g. hdd, ssd, nvme). So WAL is to be located at the fastest one.

For the given use case you just have HDD and NVMe and DB and WAL cansafely collocate. Which means you don't need to allocate specific volumefor WAL. Hence no need to answer the question how many space is neededfor WAL. Simply allocate DB and WAL will appear there automatically.


As this is an OpenStack/RBD only use-case I would say that 10GB of DB
per 1TB of disk storage is sufficient.

Given RocksDB granularity already mentioned in this thread we tend toprefer some fixed allocation sizes with 30-60Gb being close to the optimal.

Anyway suggest to use LVM for DB/WAL volume and may be start withsmaller size (e.g. 32GB per OSD) which leaves some extra spare space onyour NVMes and allows to add more space if needed. (Just to note -removing some already allocated but still unused space from existing OSDand gift it to another/new OSD is a more troublesome task than addingsome space from the spare volume).

On Wed, 14 Aug 2019 at 05:54, Mark Nelson <[email protected]
<mailto:[email protected]>> wrote:

     On 8/13/19 3:51 PM, Paul Emmerich wrote:

     > On Tue, Aug 13, 2019 at 10:04 PM Wido den Hollander <[email protected]
     <mailto:[email protected]>> wrote:
     >> I just checked an RGW-only setup. 6TB drive, 58% full, 11.2GB of
     DB in
     >> use. No slow db in use.
     > random rgw-only setup here: 12TB drive, 77% full, 48GB metadata and
     > 10GB omap for index and whatever.
     >
     > That's 0.5% + 0.1%. And that's a setup that's using mostly erasure
     > coding and small-ish objects.
     >
     >
     >> I've talked with many people from the community and I don't see an
     >> agreement for the 4% rule.
     > agreed, 4% isn't a reasonable default.
     > I've seen setups with even 10% metadata usage, but these are weird
     > edge cases with very small objects on NVMe-only setups (obviously
     > without a separate DB device).
     >
     > Paul


     I agree, and I did quite a bit of the early space usage analysis.  I
     have a feeling that someone was trying to be well-meaning and make a
     simple ratio for users to target that was big enough to handle the
     majority of use cases.  The problem is that reality isn't that simple
     and one-size-fits all doesn't really work here.


     For RBD you can usually get away with far less than 4%.  A small
     fraction of that is often sufficient.  For tiny (say 4K) RGW objects
     (especially objects with very long names!) you potentially can end up
     using significantly more than 4%. Unfortunately there's no really good
     way for us to normalize this so long as RGW is using OMAP to store
     bucket indexes.  I think the best we can do long run is make it much
     clearer how space is being used on the block/db/wal devices and easier
     for users to shrink/grow the amount of "fast" disk they have on an OSD.
     Alternately we could put bucket indexes into rados objects instead of
     OMAP, but that would be a pretty big project (with it's own challenges
     but potentially also with rewards).


     Mark

     _______________________________________________
     ceph-users mailing list
     [email protected] <mailto:[email protected]>
     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Thanks and Regards,

Hemant Sonawane

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] WAL/DB size

Reply via email to