Hello, is it possible to restart the rbd-target-api without restarting the entire container?
Ceph version: ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) root@vxxx-sx-xx-iscsi0:~# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 949dabe059eb quay.io/ceph/ceph "/usr/bin/rbd-target…" 2 months ago Up 2 months ceph-c404fafe-767c-11ee-bc37-0509d00921ba-iscsi-sx-xx-vxxxgw-pool0-vxxx-sx-xx-iscsi0-kwjqrn 43a298fe835b quay.io/ceph/ceph "/usr/bin/tcmu-runner" 2 months ago Up 2 months ceph-c404fafe-767c-11ee-bc37-0509d00921ba-iscsi-sx-xx-vxxxgw-pool0-vxxx-sx-xx-iscsi0-kwjqrn-tcmu 1eef0e6084a2 quay.io/prometheus/node-exporter:v1.5.0 "/bin/node_exporter …" 6 months ago Up 6 months ceph-c404fafe-767c-11ee-bc37-0509d00921ba-node-exporter-vxxx-sx-xx-iscsi0 4d94c24cebec quay.io/ceph/ceph "/usr/bin/ceph-crash…" 6 months ago Up 6 months ceph-c404fafe-767c-11ee-bc37-0509d00921ba-crash-vxxx-sx-xx-iscsi0 root@vxxx-sx-xx-iscsi0:~# docker exec -it 949dabe059eb /bin/bash [root@vxxx-sx-xx-iscsi0/]# ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 134572 0.2 0.0 14152 3284 pts/4 Ss 13:42 0:00 /bin/bash root 134590 0.0 0.0 46800 3460 pts/4 R+ 13:42 0:00 \_ ps auxf root 1 0.0 0.0 1020 676 ? Ss Jul14 6:09 /sbin/docker-init -- /usr/bin/rbd-target-api root 8 0.1 0.3 3785056 253040 ? Sl Jul14 152:47 /usr/bin/python3.6 -s /usr/bin/rbd-target-api Best Regards, Laszlo Kardos -----Original Message----- From: Anthony D'Atri <a...@dreamsnake.net> Sent: Tuesday, September 30, 2025 6:05 PM To: Laszlo Budai <las...@componentsoft.eu> Cc: Kardos László <laszlo.kar...@acetelecom.hu>; ceph-users@ceph.io Subject: [ceph-users] Re: Ceph GWCLI issue > The PG numbers are still very low in my opinion. you have 42 OSDs and only > 614 PGs that makes roughly 15 PG / OSD. That's quite far from the rule of > thumb of 100 PG/OSD. I've been trying to clear up this nuance in the docs when I can. The PGS field in `ceph osd df` and the target value (aka "PG ratio") are for PG replicas, not PGs, so one has to factor in replication. For EC, the replication factor is k+m. For a cluster with one pool: pg_num = (#OSDs * ratio) / replication ratio = (pg_num * replication) / #OSDs Round to the nearest power of 2, if in doubt round up. When you have multiple pools it gets more complicated. One can use the pgcalc, or leverage the PG autoscaler. That said the default target of 100 is way too low, especially since it's a max not a target as such. global advanced mon_max_pg_per_osd 600 global advanced mon_target_pg_per_osd 300 > But maybe your problem is located in a different place. You may want to > check whether all your `rbd-target-api` services are up and running. gwcli > relies on them. > > Kind regards, > Laszlo Budai > > > On 9/30/25 10:31, Kardos László wrote: >> Hello, >> I apologize for sending the wrong pool details earlier. >> We store the data in the following data pool: xxxx0-data >> >> pool 15 'xxxx0-data' erasure profile laurel_ec size 4 min_size 3 >> crush_rule If this is a 3+1 pool, note that a value of m=1 is ... fraught If this is a 2+2 pool, note that with current releases, EC for RBD is usually a significant latency liability. Tentacle's fast EC improves that dynamic. >> 8 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode off >> last_change >> 30830 lfor 0/0/30825 flags hashpspool,ec_overwrites,selfmanaged_snaps >> stripe_width 12288 application rbd,rgw >> >> cluster: >> id: c404fafe-767c-11ee-bc37-0509d00921ba >> health: HEALTH_OK >> >> services: >> mon: 5 daemons, quorum >> v188-ceph-mgr0,v188-ceph-mgr1,v188-ceph-iscsigw2,v188-ceph6,v188-ceph >> 5 (age >> 5d) >> mgr: v188-ceph-mgr0.rxcecw(active, since 11w), standbys: >> v188-ceph-mgr1.hmbuma >> mds: 1/1 daemons up, 1 standby >> osd: 42 osds: 42 up (since 2M), 42 in (since 3M) >> tcmu-runner: 10 portals active (4 hosts) >> >> data: >> volumes: 1/1 healthy >> pools: 11 pools, 614 pgs >> objects: 13.63M objects, 51 TiB >> usage: 75 TiB used, 71 TiB / 147 TiB avail >> pgs: 613 active+clean >> 1 active+clean+scrubbing+deep >> >> io: >> client: 8.1 MiB/s rd, 105 MiB/s wr, 320 op/s rd, 2.31k op/s wr >> >> >> Best Regards, >> Laszlo Kardos >> >> -----Original Message----- >> From: Eugen Block<ebl...@nde.ag> >> Sent: Tuesday, September 30, 2025 9:03 AM To:ceph-users@ceph.io >> Subject: [ceph-users] Re: Ceph GWCLI issue >> >> >> Hi, >> >> I don't have an answer why the image is in unknown state, but I'd be >> concerned about the pool's pg_num. You have Terabytes in a pool with >> a single PG? That's awful and should be increased to a more suitable >> value. I can't say if that would fix anything regarding the unknown >> issue, but that's definitely not good at all. >> >> What is the overall Ceph status (ceph -s)? >> >> Regards, >> Eugen >> >> >> Zitat von Kardos László<laszlo.kar...@acetelecom.hu>: >> >>> Hello, >>> >>> We have encountered the following issue in our production environment: >>> >>> A new RBD Image was created within an existing pool, and its status >>> is reported as "unknown" in GWCLI. Based on our tests, this does not >>> appear to cause operational issues, but we would like to investigate >>> the root cause. No relevant information regarding this issue was >>> found in the logs. >>> >>> GWCLI output: >>> >>> >>> >>> o- / >>> .......................................................................... >>> ............................................... [...] >>> >>> o- cluster >>> .......................................................................... >>> ............................... [Clusters: 1] >>> >>> | o- ceph >>> .......................................................................... >>> .................................. [HEALTH_OK] >>> >>> | o- pools >>> .......................................................................... >>> ............................... [Pools: 11] >>> >>> | | o- .mgr >>> ................................................................ >>> [(x3), >>> Commit: 0.00Y/15591725M (0%), Used: 194124K] >>> >>> | | o- .nfs >>> ................................................................. >>> [(x3), >>> Commit: 0.00Y/15591725M (0%), Used: 16924b] >>> >>> | | o- xxxx-test >>> ............................................................. >>> [(2+1), >>> Commit: 0.00Y/23727198M (0%), Used: 0.00Y] >>> >>> | | o- xxxxx-erasure-0 ............................................ >>> [(2+1), Commit: 0.00Y/23727198M (0%), Used: 61519257668K] >>> >>> | | o- xxxxxx-repl >>> ...................................................... [(x3), Commit: >>> 0.00Y/15591725M (0%), Used: 130084b] >>> >>> | | o- cephfs.cephfs-test.data >>> ............................................ [(x3), Commit: >>> 0.00Y/15591725M (0%), Used: 9090444K] >>> >>> | | o- cephfs.cephfs-test.meta >>> .......................................... [(x3), Commit: >>> 0.00Y/15591725M (0%), Used: 516415713b] >>> >>> | | o- xxxxx-data >>> ..................................................... [(3+1), Commit: >>> 0.00Y/9604386M (0%), Used: 7547753556K] >>> >>> | | o- xxxxx-rpl >>> .......................................................... [(x3), >>> Commit: >>> 12.0T/4268616M (294%), Used: 85265b] >>> >>> | | o- xxxxx-data >>> ................................................... >>> [(3+1), Commit: 0.00Y/5011626M (0%), Used: 10955179612K] >>> >>> | | o- replicated_xxxx >>> ............................................... >>> [(x3), Commit: 25.0T/2280846592K (1176%), Used: 46912b] >>> >>> | o- topology >>> .......................................................................... >>> ..................... [OSDs: 42,MONs: 5] >>> >>> o- disks >>> .......................................................................... >>> ............................. [37.0T, Disks: 3] >>> >>> | o- xxxx-rpl >>> .......................................................................... >>> ................... [xxxx-rpl (12.0T)] >>> >>> | | o- xxxxx_lun0 >>> ........................................................................ >>> [xxxx-rpl/xxxxx_lun0 (Online, 12.0T)] >>> >>> | o- replicated_xxxx >>> .......................................................................... >>> ..... [replicated_xxxx (25.0T)] >>> >>> | o- xxxx_lun0 >>> ............................................................... >>> [replicated_xxxx/xxxx_lun0 (Online, 12.0T)] >>> >>> | o- xxxx_lun_new >>> ........................................................ >>> [replicated_xxxx/xxxx_lun_new (Unknown, 13.0T)] >>> >>> >>> >>> The image (xxxx_lun_new) is provisioned to multiple ESXi hosts, >>> mounted, and formatted with VMFS6. The datastore is writable and >>> readable by the hosts. >>> >>> There is a change in the block size of the RBD Image: the older RBD >>> Images use a 4 MiB block size, while the new RBD Image uses a 512 >>> KiB block size. >>> >>> RBD Image Parameters: >>> >>> For replicated_xxxx / xxxx_lun0 (Online status in GWCLI): >>> >>> >>> >>> rbd image 'xxxx_lun0': >>> >>> size 12 TiB in 3145728 objects >>> >>> order 22 (4 MiB objects) >>> >>> snapshot_count: 0 >>> >>> id: 5c1b5ecfdfa46 >>> >>> data_pool: xxxx0-data >>> >>> block_name_prefix: rbd_data.14.5c1b5ecfdfa46 >>> >>> format: 2 >>> >>> features: exclusive-lock, data-pool >>> >>> op_features: >>> >>> flags: >>> >>> create_timestamp: Tue Jul 8 13:02:11 2025 >>> >>> access_timestamp: Thu Sep 25 13:49:47 2025 >>> >>> modify_timestamp: Thu Sep 25 13:50:05 2025 >>> >>> >>> >>> >>> >>> For replicated_xxxx / xxxx_lun_new (Unknown status in GWCLI): >>> >>> rbd image 'xxxx_lun_new': >>> size 13 TiB in 27262976 objects >>> order 19 (512 KiB objects) >>> snapshot_count: 0 >>> id: 1945d9cf9f41ab >>> data_pool: xxxx0-data >>> block_name_prefix: rbd_data.14.1945d9cf9f41ab >>> format: 2 >>> features: exclusive-lock, data-pool >>> op_features: >>> flags: >>> create_timestamp: Wed Sep 24 11:21:21 2025 >>> access_timestamp: Thu Sep 25 13:50:42 2025 >>> modify_timestamp: Thu Sep 25 13:49:48 2025 >>> >>> >>> >>> Pool Parameters: >>> >>> pool 14 'replicated_xxxx' replicated size 3 min_size 2 crush_rule 7 >>> object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on >>> last_change >>> 30743 flags hashpspool stripe_width 0 application rbd,rgw >>> >>> Ceph version: >>> >>> ceph --version >>> >>> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) >>> quincy >>> (stable) >>> >>> >>> >>> Question: >>> >>> What could be causing the RBD Image (xxxx_lun_new) to appear in an >>> "unknown" state in GWCLI? >> >> _______________________________________________ >> ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an >> email toceph-users-le...@ceph.io >> _______________________________________________ >> ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an >> email toceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io