That was a good way to check for the recovery sleep.  Does your `ceph
status` show 128 PGs backfilling (or a number near that at least)?  The PGs
not backfilling will say 'backfill+wait'.

On Mon, Feb 26, 2018 at 11:25 AM Oliver Freyermuth <
[email protected]> wrote:

> Am 26.02.2018 um 16:59 schrieb Patrick Donnelly:
> > On Sun, Feb 25, 2018 at 10:26 AM, Oliver Freyermuth
> > <[email protected]> wrote:
> >> Looking with:
> >> ceph daemon osd.2 perf dump
> >> I get:
> >>     "bluefs": {
> >>         "gift_bytes": 0,
> >>         "reclaim_bytes": 0,
> >>         "db_total_bytes": 84760592384,
> >>         "db_used_bytes": 78920024064,
> >>         "wal_total_bytes": 0,
> >>         "wal_used_bytes": 0,
> >>         "slow_total_bytes": 0,
> >>         "slow_used_bytes": 0,
> >> so it seems this is almost exclusively RocksDB usage.
> >>
> >> Is this expected?
> >
> > Yes. The directory entries are stored in the omap of the objects. This
> > will be stored in the RocksDB backend of Bluestore.
> >
> >> Is there a recommendation on how much MDS storage is needed for a
> CephFS with 450 TB?
> >
> > It seems in the above test you're using about 1KB per inode (file).
> > Using that you can extrapolate how much space the data pool needs
> > based on your file system usage. (If all you're doing is filling the
> > file system with empty files, of course you're going to need an
> > unusually large metadata pool.)
> >
> Many thanks, this helps!
> We naturally hope our users will not do this, this stress test was a worst
> case -
> but the rough number (1 kB per inode) does indeed help a lot, and also the
> increase with modifications
> of the file as laid out by David.
>
> Is also the slow backfilling normal?
> Will such increase in storage (by many file modifications) at some point
> also be reduced, i.e.
> is the database compacted / can one trigger that / is there something like
> "SQL vacuum"?
>
> To also answer David's questions in parallel:
> - Concerning the slow backfill, I am only talking about the "metadata
> OSDs".
>   They are fully SSD backed, and have no separate device for block.db /
> WAL.
> - I adjusted backfills up to 128 for those metadata OSDs, the cluster is
> currently fully empty, i.e. no client's are doing anything.
>   There are no slow requests.
>   Since no clients are doing anything and the rest of the cluster is now
> clean (apart from the two backfilling OSDs),
>   right now there is also no memory pressure at all.
>   The "clean" OSDs are reading with 7 MB/s each, with 5 % CPU load each.
>   The OSDs being backfilled have 3.3 % CPU load, and have about 250 kB/s
> of write throughput.
>   Network traffic between the node with the clean OSDs and the
> "being-bbackfilled" OSDs is about 1.5 Mbit/s, while there is significantly
> more bandwidth available...
> - Checking sleeps with:
> # ceph -n osd.1 --show-config | grep sleep
> osd_recovery_sleep = 0.000000
> osd_recovery_sleep_hdd = 0.100000
> osd_recovery_sleep_hybrid = 0.025000
> osd_recovery_sleep_ssd = 0.000000
> shows there should be 0 sleep. Or is there another way to query?
>
> Cheers and many thanks for the valuable replies!
>         Oliver
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to