[ceph-users] R: R: Re: CephFS troubleshooting

Eugenio Tampieri Wed, 04 Sep 2024 03:42:51 -0700

> Has it worked before or did it just stop working at some point? What's the 
> exact command that fails (and error message if there is)?


It was working using the NFS gateway, I never tried with the Ceph FUSE mount. 
The command is ceph-fuse --id migration /mnt/repo. No error message, it just 
hangs.

> > For the "too many PGs per OSD" I suppose I have to add some other 
> > OSDs, right?

> Either that or reduce the number of PGs. If you had only a few pools I'd 
> suggest to leave it to the autoscaler, but not for 13 pools. You can paste 
> 'ceph osd df' and 'ceph osd pool ls detail' if you need more input for that.

I already have the autoscaler enabled. Here is the output you asked for
---
ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     
AVAIL    %USE   VAR   PGS  STATUS
 2    hdd  0.90970   1.00000  932 GiB  332 GiB  330 GiB  1.7 MiB  1.4 GiB  600 
GiB  35.63  0.88  329      up
 4    hdd  0.90970   1.00000  932 GiB  400 GiB  399 GiB  1.6 MiB  1.5 GiB  531 
GiB  42.94  1.07  331      up
 3    hdd  0.45479   1.00000  466 GiB  203 GiB  202 GiB  1.0 MiB  988 MiB  263 
GiB  43.57  1.08  206      up
 5    hdd  0.93149   1.00000  932 GiB  379 GiB  378 GiB  1.6 MiB  909 MiB  552 
GiB  40.69  1.01  321      up
                       TOTAL  3.2 TiB  1.3 TiB  1.3 TiB  5.9 MiB  4.8 GiB  1.9 
TiB  40.30
MIN/MAX VAR: 0.88/1.08  STDDEV: 3.15
---
ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 3 crush_rule 0 object_hash rjenkins 
pg_num 1 pgp_num 1 autoscale_mode on last_change 24150 flags hashpspool 
stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'kubernetes' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/92 
flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 3 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/123 
flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/132 
flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
lfor 0/0/132 flags hashpspool stripe_width 0 application rgw
pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/134 
flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw
pool 7 'repo_data' replicated size 2 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 30692 lfor 
0/30692/30690 flags hashpspool stripe_width 0 application cephfs
pool 8 'repo_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/150 
flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
recovery_priority 5 application cephfs
pool 9 '.nfs' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins 
pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 0/0/169 flags 
hashpspool stripe_width 0 application nfs
pool 11 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
lfor 0/0/592 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw
pool 12 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
lfor 0/0/592 flags hashpspool stripe_width 0 application rgw
pool 13 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0 
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 
lfor 0/0/644 flags hashpspool stripe_width 0 application rgw
pool 19 'kubernetes-lan' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 24150 lfor 
0/0/15682 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
---
Regards

Zitat von Eugenio Tampieri <[email protected]>:

> Hi Eugen,
> Sorry, but I had some trouble when I signed up and then I was away so 
> I missed your reply.
>
>> ceph auth export client.migration
>> [client.migration]
>>         key = redacted
>>         caps mds = "allow rw fsname=repo"
>>         caps mon = "allow r fsname=repo"
>>         caps osd = "allow rw tag cephfs data=repo"
>
> For the "too many PGs per OSD" I suppose I have to add some other 
> OSDs, right?
>
> Thanks,
>
> Eugenio
>
> -----Messaggio originale-----
> Da: Eugen Block <[email protected]>
> Inviato: mercoledì 4 settembre 2024 10:07
> A: [email protected]
> Oggetto: [ceph-users] Re: CephFS troubleshooting
>
> Hi, I already responded to your first attempt:
>
> https://lists.ceph.io/hyperkitty/list/[email protected]/message/GS7KJ
> RJP7BAOF66KJM255G27TJ4KG656/
>
> Please provide the requested details.
>
>
> Zitat von Eugenio Tampieri <[email protected]>:
>
>> Hello,
>> I'm writing to troubleshoot an otherwise functional Ceph quincy 
>> cluster that has issues with cephfs.
>> I cannot mount it with ceph-fuse (it gets stuck), and if I mount it 
>> with NFS I can list the directories but I cannot read or write 
>> anything.
>> Here's the output of ceph -s
>>   cluster:
>>     id:     3b92e270-1dd6-11ee-a738-000c2937f0ec
>>     health: HEALTH_WARN
>>             mon ceph-storage-a is low on available space
>>             1 daemons have recently crashed
>>             too many PGs per OSD (328 > max 250)
>>
>>   services:
>>     mon:        5 daemons, quorum
>> ceph-mon-a,ceph-storage-a,ceph-mon-b,ceph-storage-c,ceph-storage-d
>> (age 105m)
>>     mgr:        ceph-storage-a.ioenwq(active, since 106m), standbys:
>> ceph-mon-a.tiosea
>>     mds:        1/1 daemons up, 2 standby
>>     osd:        4 osds: 4 up (since 104m), 4 in (since 24h)
>>     rbd-mirror: 2 daemons active (2 hosts)
>>     rgw:        2 daemons active (2 hosts, 1 zones)
>>
>>   data:
>>     volumes: 1/1 healthy
>>     pools:   13 pools, 481 pgs
>>     objects: 231.83k objects, 648 GiB
>>     usage:   1.3 TiB used, 1.8 TiB / 3.1 TiB avail
>>     pgs:     481 active+clean
>>
>>   io:
>>     client:   1.5 KiB/s rd, 8.6 KiB/s wr, 1 op/s rd, 0 op/s wr
>> Best regards,
>>
>> Eugenio Tampieri
>> _______________________________________________
>> ceph-users mailing list -- [email protected] To unsubscribe send an 
>> email to [email protected]
>
>
> _______________________________________________
> ceph-users mailing list -- [email protected] To unsubscribe send an 
> email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] R: R: Re: CephFS troubleshooting

Reply via email to