This sounds like a network configuration issue to me. The fact that you
mention ssh'ing into the nodes or running apt get is slow sounds like DNS
timeouts. Make sure that you only have IP address and subnet configured on
your cluster network interface (no gateway or DNS).

On Fri, Jul 18, 2025, 16:48 Anthony Fecarotta <anth...@linehaul.ai> wrote:

> Hello,
>
> I have a test cluster of some mini-PCs. This one, in particular, runs
> Proxmox, and has two Ceph RBDs (LXCs & VMs).
>
> The purpose of this test cluster was to test Docker Swarm. I wanted to get
> the feel for orchestration - our five-node production cluster is very
> simple and Kubernetes would be overkill.
>
> Each node boots off of NVMe, and each node has one OSD, PCIe Gen 4 M.2
> NVMe. I understand this equipment is not optimal, but please keep in mind
> this is a test cluster. All things considered, it was running fine for two
> months, I even made some of our non-critical BETA programs available for
> internal use within our organization.
>
> Yesterday, I connected a 2.5GBe unmanaged switch, to the second 2.5GBe NIC
> of each node cluster, creating a Private/Cluster Network for Ceph. Since
> then, each node, VM, LXC, etc. are moving at a glacial pace. Just to give
> an example, a sudo apt update or just logging in via SSH can take sixty
> seconds.
> [global] auth_client_required = cephx auth_cluster_required = cephx
> auth_service_required = cephx cluster_network = 172.16.1.0/24 fsid =
> 3c395d5c-7d46-4dc7-ad4b-8a6761f167b0 mon_allow_pool_delete = true mon_host
> = 192.168.128.156 192.168.128.150 192.168.128.158 ms_bind_ipv4 = true
> ms_bind_ipv6 = false osd_pool_default_min_size = 2 osd_pool_default_size =
> 3 public_network = 192.168.128.156/24 [client] keyring =
> /etc/pve/priv/$cluster.$name.keyring [client.crash] keyring =
> /etc/pve/ceph/$cluster.$name.keyring [mds] keyring =
> /var/lib/ceph/mds/ceph-$id/keyring [mon.asusNuc1] public_addr =
> 192.168.128.150 [mon.chyna2gb] public_addr = 192.168.128.156 [mon.chyna4tb]
> public_addr = 192.168.128.158
> # begin crush map tunable choose_local_tries 0 tunable
> choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable
> chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable
> chooseleaf_stable 1 tunable straw_calc_version 1 tunable
> allowed_bucket_algs 54 # devices device 0 osd.0 class nvme device 1 osd.1
> class nvme device 2 osd.2 class nvme device 3 osd.3 class nvme # types type
> 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6
> pod type 7 room type 8 datacenter type 9 zone type 10 region type 11 root #
> buckets host chyna2gb { id -3 # do not change unnecessarily id -4 class
> nvme # do not change unnecessarily # weight 1.86299 alg straw2 hash 0 #
> rjenkins1 item osd.0 weight 1.86299 } host chyna4tb { id -5 # do not change
> unnecessarily id -6 class nvme # do not change unnecessarily # weight
> 3.63869 alg straw2 hash 0 # rjenkins1 item osd.1 weight 3.63869 } host nuc
> { id -7 # do not change unnecessarily id -8 class nvme # do not change
> unnecessarily # weig
>  ht 0.90970 alg straw2 hash 0 # rjenkins1 item osd.2 weight 0.90970 } host
> asusNuc1 { id -9 # do not change unnecessarily id -10 class nvme # do not
> change unnecessarily # weight 3.63869 alg straw2 hash 0 # rjenkins1 item
> osd.3 weight 3.63869 } root default { id -1 # do not change unnecessarily
> id -2 class nvme # do not change unnecessarily # weight 10.05006 alg straw2
> hash 0 # rjenkins1 item chyna2gb weight 1.86299 item chyna4tb weight
> 3.63869 item nuc weight 0.90970 item asusNuc1 weight 3.63869 } # rules rule
> replicated_rule { id 0 type replicated step take default step chooseleaf
> firstn 0 type host step emit } # end crush map
> root@asusNuc1:~# ceph osd perf osd commit_latency(ms) apply_latency(ms) 0
> 6 6 3 15 15 2 23 23 1 4 4
> To be clear, the "dumb switch" is isolated and not connected to the rest
> of the network.
>
>
> Regards,
> [image]
> Anthony Fecarotta
> Founder & President
> [image] anth...@linehaul.ai <mailto:anth...@linehaul.ai>
> [image] 224-339-1182 [image] (855) 625-0300
> [image] 1 Mid America Plz Flr 3 Oakbrook Terrace, IL 60181
> <https://www.google.com/maps/search/1+Mid+America+Plz+Flr+3+Oakbrook+Terrace,+IL+60181?entry=gmail&source=g>
>
> [image] www.linehaul.ai <http://www.linehaul.ai/>
> [image] <http://www.linehaul.ai/>
> [image] <https://www.linkedin.com/in/anthony-fec/>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to