That was a good way to check for the recovery sleep. Does your `ceph status` show 128 PGs backfilling (or a number near that at least)? The PGs not backfilling will say 'backfill+wait'.
On Mon, Feb 26, 2018 at 11:25 AM Oliver Freyermuth < [email protected]> wrote: > Am 26.02.2018 um 16:59 schrieb Patrick Donnelly: > > On Sun, Feb 25, 2018 at 10:26 AM, Oliver Freyermuth > > <[email protected]> wrote: > >> Looking with: > >> ceph daemon osd.2 perf dump > >> I get: > >> "bluefs": { > >> "gift_bytes": 0, > >> "reclaim_bytes": 0, > >> "db_total_bytes": 84760592384, > >> "db_used_bytes": 78920024064, > >> "wal_total_bytes": 0, > >> "wal_used_bytes": 0, > >> "slow_total_bytes": 0, > >> "slow_used_bytes": 0, > >> so it seems this is almost exclusively RocksDB usage. > >> > >> Is this expected? > > > > Yes. The directory entries are stored in the omap of the objects. This > > will be stored in the RocksDB backend of Bluestore. > > > >> Is there a recommendation on how much MDS storage is needed for a > CephFS with 450 TB? > > > > It seems in the above test you're using about 1KB per inode (file). > > Using that you can extrapolate how much space the data pool needs > > based on your file system usage. (If all you're doing is filling the > > file system with empty files, of course you're going to need an > > unusually large metadata pool.) > > > Many thanks, this helps! > We naturally hope our users will not do this, this stress test was a worst > case - > but the rough number (1 kB per inode) does indeed help a lot, and also the > increase with modifications > of the file as laid out by David. > > Is also the slow backfilling normal? > Will such increase in storage (by many file modifications) at some point > also be reduced, i.e. > is the database compacted / can one trigger that / is there something like > "SQL vacuum"? > > To also answer David's questions in parallel: > - Concerning the slow backfill, I am only talking about the "metadata > OSDs". > They are fully SSD backed, and have no separate device for block.db / > WAL. > - I adjusted backfills up to 128 for those metadata OSDs, the cluster is > currently fully empty, i.e. no client's are doing anything. > There are no slow requests. > Since no clients are doing anything and the rest of the cluster is now > clean (apart from the two backfilling OSDs), > right now there is also no memory pressure at all. > The "clean" OSDs are reading with 7 MB/s each, with 5 % CPU load each. > The OSDs being backfilled have 3.3 % CPU load, and have about 250 kB/s > of write throughput. > Network traffic between the node with the clean OSDs and the > "being-bbackfilled" OSDs is about 1.5 Mbit/s, while there is significantly > more bandwidth available... > - Checking sleeps with: > # ceph -n osd.1 --show-config | grep sleep > osd_recovery_sleep = 0.000000 > osd_recovery_sleep_hdd = 0.100000 > osd_recovery_sleep_hybrid = 0.025000 > osd_recovery_sleep_ssd = 0.000000 > shows there should be 0 sleep. Or is there another way to query? > > Cheers and many thanks for the valuable replies! > Oliver > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
