Hi,
--
First, let me start with the bonus...
I migrated from hammer => jewel and followed the migration instructions... but
migrations instructions are missing this :
#chown -R ceph:ceph /var/log/ceph
I just discoved this was the reason I found no log nowhere about my current
issue :/
--
This is maybe the 3rd time this happens to me ... This time I'd like to try to
understand what happens.
So. ceph-10.2.0-0.el7.x86_64+Cent0S 7.2 here.
Ceph health was happy, but any rbd operation was hanging - hence : ceph was
hung, and so were the test VMs running on it.
I placed my VM in an EC pool on top of which I overlayed an RBD pool with SSDs.
The EC pool is defined as being a 3+1 pool, with 5 hosts hosting the OSDs (and
the failure domain is set to hosts)
"Ceph -w" wasn't displaying new status lines as usual, but ceph health (detail)
wasn't saying anything would be wrong.
After looking at one node, I found that ceph logs were empty on one node, so I
decided to restart the OSDs on that one using : systemctl restart ceph-osd@*
After I did that, ceph -w got to life again , but telling me there was a dead
MON - which I restarted too.
I watched some kind of recovery happening, and after a few seconds/minutes, I
now see :
[root@ceph0 ~]# ceph health detail
HEALTH_WARN 4 pgs degraded; 3 pgs recovering; 1 pgs recovery_wait; 4 pgs stuck
unclean; recovery 57/373846 objects degraded (0.015%); recovery 57/110920
unfound (0.051%)
pg 691.65 is stuck unclean for 310704.556119, current state
active+recovery_wait+degraded, last acting [44,99,69,9]
pg 691.1e5 is stuck unclean for 493631.370697, current state
active+recovering+degraded, last acting [77,43,20,99]
pg 691.12a is stuck unclean for 14521.475478, current state
active+recovering+degraded, last acting [42,56,7,106]
pg 691.165 is stuck unclean for 14521.474525, current state
active+recovering+degraded, last acting [21,71,24,117]
pg 691.165 is active+recovering+degraded, acting [21,71,24,117], 15 unfound
pg 691.12a is active+recovering+degraded, acting [42,56,7,106], 1 unfound
pg 691.1e5 is active+recovering+degraded, acting [77,43,20,99], 2 unfound
pg 691.65 is active+recovery_wait+degraded, acting [44,99,69,9], 39 unfound
recovery 57/373846 objects degraded (0.015%)
recovery 57/110920 unfound (0.051%)
Damn.
Last time this happened, I was forced to declare lost the PGs in order to
recover a "healthy" ceph, because ceph does not want to revert PGs in EC pools.
But one of the VMs started hanging randomly on disk IOs...
This same VM is now down, and I can't remove its disk from rbd, it's hanging at
99% - I could work that around by renaming the file and re-installing the VM on
a new disk, but anyway, I'd like to understand+fix+make sure this does not
happen again.
We sometimes suffer power cuts here : if restarting daemons kills ceph data, I
cannot think of what would happen in case of power cut...
Back to the unfound objects. I have no OSD down that would be in the cluster
(only 1 down, and I put it myself down - OSD.46 - , but set its weight to 0
last week)
I can query the PGs, but I don't understand what I see in there.
For instance :
#ceph pg 691.65 query
(...)
"num_objects_missing": 0,
"num_objects_degraded": 39,
"num_objects_misplaced": 0,
"num_objects_unfound": 39,
"num_objects_dirty": 138,
And then for 2 peers I see :
"state": "active+undersized+degraded", ## undersized ???
(...)
"num_objects_missing": 0,
"num_objects_degraded": 138,
"num_objects_misplaced": 138,
"num_objects_unfound": 0,
"num_objects_dirty": 138,
"blocked_by": [],
"up_primary": 44,
"acting_primary": 44
If I look at the "missing" objects, I can see something on some OSDs :
# ceph pg 691.165 list_missing
(...)
{
"oid": {
"oid": "rbd_data.8de32431bd7b7.0000000000000ea7",
"key": "",
"snapid": -2,
"hash": 971513189,
"max": 0,
"pool": 691,
"namespace": ""
},
"need": "26521'22595",
"have": "25922'22575",
"locations": []
}
All of the missing objects have this "need/have" discrepancy.
I can see such objects in a "691.165" directory on secondary OSDs, but I do not
see any 691.165 directory on the primary OSD (44)... ?
For instance :
[root@ceph0 ~]# ll
/var/lib/ceph/osd/ceph-21/current/691.165s0_head/*8de32431bd7b7.0000000000000ea7*
-rw-r--r-- 1 ceph ceph 1399392 May 15 13:18
/var/lib/ceph/osd/ceph-21/current/691.165s0_head/rbd\udata.8de32431bd7b7.0000000000000ea7__head_39E81D65__2b3_5843_0
-rw-r--r-- 1 ceph ceph 1399392 May 27 11:07
/var/lib/ceph/osd/ceph-21/current/691.165s0_head/rbd\udata.8de32431bd7b7.0000000000000ea7__head_39E81D65__2b3_ffffffffffffffff_0
Even so : assuming I would have lost data on that OSD 44 (how ??), I would
assume ceph would be able to reconstruct the missing data/PG thanks to the
erasure codes/replica for RBD , it looks like it's not willing to ??
I already know that telling ceph to forget about the lost PGs is not a good
idea, as it will cause the VMs using them to hang afterwards... and I'd prefer
seeing ceph as a rock-solid solution allowing one to recover from such "usual"
operations... ?
If anyone got ideas, I'd be happy ... should I kill osd.44 for good and
recreate it ?
Thanks
P.S : I already tried to :
"ceph tell osd.44 injectargs --debug-osd 0/5 --debug-filestore 0/5"
Or
"ceph tell osd.44 injectargs --debug-osd 20/20 --debug-filestore 20/20"
PS : I tried this before I found the bonus at the start of this email...
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com