Hi,
your message is a bit difficult to read due to the formatting, just FYI.
Is this the first pvc you're trying to mount? Has it worked before? I
don't know too much about rook, btw. If the MDS is really up and
running (which you confirmed), the mentioned error message is usually
one of two things:
Either you're not specifying a MON server but a MDS server in the
mount command, or it's actually an auth error.
And given this error message:
mount args: [-t ceph
csi-cephfs-nod...@039a3dba-d55c-476f-90f0-8783a18338aa.main-ceph-fs=/...
It looks like it might be the first case. Or is csi-cephfs-node.1 a
MON node as well?
Regards,
Eugen
Zitat von Martin Reid <[email protected]>:
Hey everyone,
I've got a difficult problem with my CephFS that I haven't been able
to make any headway with. Maybe you guys can help?
The problem
I’m trying to provision a volume on a CephFS, using a Ceph cluster
installed on Kubernetes (K3s) using Rook, but I’m running into the
following error (from the Events in |kubectl describe|:
|Events: Type Reason Age From Message ---- ------ ---- ---- -------
Normal Scheduled 4m24s default-scheduler Successfully assigned
archie/ceph-loader-7989b64fb5-m8ph6 to archie Normal
SuccessfulAttachVolume 4m24s attachdetach-controller
AttachVolume.Attach succeeded for volume
"pvc-95b6ca46-cf51-4e58-9bb5-114f00aa4267" Warning FailedMount 3m18s
kubelet MountVolume.MountDevice failed for volume
"pvc-95b6ca46-cf51-4e58-9bb5-114f00aa4267" : rpc error: code =
Internal desc = an error (exit status 32) occurred while running
mount args: [-t ceph
csi-cephfs-nod...@039a3dba-d55c-476f-90f0-8783a18338aa.main-ceph-fs=/volumes/csi/csi-vol-25d616f5-918f-4e15-bfd6-55b866f9aa9f/4bda56a4-5088-451c-90c8-baa83317d5a5 /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/3e10b46e93bcc2c4d3d1b343af01ee628c736ffee7e562e99d478bc397dab10d/globalmount -o mon_addr=10.43.233.111:3300/10.43.237.205:3300/10.43.39.81:3300,secretfile=/tmp/csi/keys/keyfile-2996214224,_netdev] stderr: mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be
authorized|
I’m kind of new to K8s, and /very/ new to Ceph, so I would love some
advice on how to go about debugging this mess.
General context
*Kubernetes distribution*: K3s
*Kubernetes version(s)*: v1.33.4+k3s1 (master), v1.32.7+k3s1 (workers)
*Ceph*: installed via Rook
*Nodes*: 3
*OS*: Linux (Arch on master, NixOS on workers)
What I’ve checked/tried
*Note*: Since this is a Rook deployment of Ceph (on Kubernetes), all
these checks are performed in the Rook Toolbox
<https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/>
container.
MDS status / Ceph cluster health
Even I know this is the first go-to when your Ceph cluster is giving
you issues. I have the Rook toolbox
<https://rook.io/docs/rook/latest-release/Troubleshooting/ceph-toolbox/>
running on my K8s cluster, so I went into the toolbox pod and ran:
|$ ceph status cluster: id: 039a3dba-d55c-476f-90f0-8783a18338aa
health: HEALTH_OK|
services: mon: 3 daemons, quorum a,c,b (age 7d) mgr: b(active, since
7d), standbys: a mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3
up (since 7d), 3 in (since 2w)
data: volumes: 1/1 healthy pools: 4 pools, 81 pgs objects: 47
objects, 3.2 MiB usage: 139 MiB used, 502 GiB / 502 GiB avail pgs:
81 active+clean
io: client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
Since the error we started out with |mount error: no mds (Metadata
Server) is up|, I checked the |ceph status| output above for the
status of the metadata server. As you can see, all the MDS instances
are running.
CephFS Status
|$ ceph fs status main-ceph-fs - 0 clients ============ RANK STATE
MDS ACTIVITY DNS INOS DIRS CAPS 0 active main-ceph-fs-a Reqs: 0 /s
143 38 37 0 0-s standby-replay main-ceph-fs-b Evts: 0 /s 159 30 29 0
POOL TYPE USED AVAIL main-ceph-fs-metadata metadata 4176k 158G
main-ceph-fs-replicated data 0 158G
main-ceph-fs-main-ceph-fs-replicated data 0 158G STANDBY MDS
main-ceph-fs-d main-ceph-fs-c MDS version: ceph version 19.2.3
(c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable)|
Ceph authorizations for MDS
Since the other part of the error indicated that I might not be
authorized, I wanted to check what the authorizations were:
|$ ceph auth ls mds.main-ceph-fs-a # main MDS for my CephFS instance
key: <base64 key> caps: [mds] allow caps: [mon] allow profile mds
caps: [osd] allow * mds.main-ceph-fs-b # standby MDS for my CephFS
instance key: <different base64 key> caps: [mds] allow caps: [mon]
allow profile mds caps: [osd] allow * ... client.csi-cephfs-node.1 #
the client mentioned in the error message key: <another base64 key>
caps: [mds] allow rw caps: [mgr] allow rw caps: [mon] allow r caps:
[osd] allow rwx tag cephfs metadata=*, allow rw tag cephfs data=*
... # more after this|
Note: |main-ceph-fs| is the name I gave my CephFS file system.
It looks like this should be okay, but I’m not sure. Definitely open
to some more insight here.
PersistentVolumeClaim binding
I checked to make sure the PersistentVolume was provisioned
successfully from the PersistentVolumeClaim, and that it bound
appropriately:
|$ kubectl get pvc -n archie jellyfin-ceph-pvc NAME STATUS VOLUME
CAPACITY jellyfin-ceph-pvc Bound
pvc-95b6ca46-cf51-4e58-9bb5-114f00aa4267 180Gi |
Changing the PVC size to something smaller
I tried changing the PVC’s size from 180GB to 1GB, since I thought
it might be a free space issue,but the error persisted.
Turning off firewalls
Turned off all firewall to see if it was that, and still no luck.
I’m not quite sure where to go from here.
What am I missing? What context should I add? What should I try?
What should I check?
Thank you so much in advance,
Martin
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]