One other thing to note with this experience is that we do a LOT of RBD snap 
trimming, like hundreds of millions of objects per day added to our snap_trimqs 
globally. All of the unfound objects in these cases were found on other OSDs in 
the cluster with identical contents, but associated with different snapshots. 
In other words, the file contents matched exactly, but the xattrs differed and 
the filenames indicated that the objects belonged to different snapshots.

Some of the unfound objects belonged to head, so I don't necessarily believe 
that they were in the process of being trimmed, but I imagine there is some 
possibility that this issue is related to snap trimming or deleting snapshots. 
Just more information...

On Thu, 2017-03-30 at 17:13 +0000, Steve Taylor wrote:

Good suggestion, Nick. I actually did that at the time. The "ceph osd map" 
wasn't all that interesting because the OSDs had been outed and their PGs had 
been mapped to new OSDs. Everything appeared to be in order with the PGs being 
mapped to the right number of new OSDs. The PG mappings looked fine, but the 
objects just didn't exist anywhere except on the OSDs that had been marked out.

The PG queries were a little more useful, but still didn't really help in the 
end. In all cases (unfound objects from 2 OSDs in each of 2 occurrences), the 
PGs showed 5 or so OSDs where they thought the unfound objects might be, one of 
which was an OSD that had been marked out. In both cases we even waited until 
backfilling completed to see if perhaps the missing objects would turn up 
somewhere else, but none ever did.

In the first instance we were simply able to reattach the 2 OSDs to the cluster 
with 0 weight and recover the unfound objects. The second instance involved 
drive problems and was a little bit trickier. The drives had experienced errors 
and the XFS filesystems had both become corrupt and wouldn't even mount. We 
didn't have any spare drives large enough, so I ended up using dd, ignoring 
errors, to copy the disks to RBDs in a different Ceph cluster. I then kernel 
mapped the RBDs on the host with the failed drives, ran XFS repairs on them, 
mouted them to the OSD directories, started the OSDs, and put them back in the 
cluster with 0 weight. I was lucky enough that those objects were available and 
they were recovered. Of course I immediately removed those OSDs once the 
unfound objects cleared up.

That's the other intersting aspect of this problem. This cluster had 4TB HGST 
drives for its OSDs, but we had to expand it fairly urgently and didn't have 
enough drives. We added two new servers, each with 16 4TB drives and 16 8TB 
HGST He8 drives. In both instances the problems we encountered were with the 
8TB drives. We have since acquired more 4TB drives and have replaced all of the 
8TB drives in the cluster. We have a total of 8 production clusters globally 
and have been running Ceph in production for 2 years. These two occurences 
recently are the only times we've seen these types of issues, and it was 
exclusive to the 8TB OSDs. I'm not sure how that would cause such a problem, 
but it's an interesting data point.

On Thu, 2017-03-30 at 17:33 +0100, Nick Fisk wrote:
Hi Steve,

If you can recreate or if you can remember the object name, it might be worth 
trying to run “ceph osd map” on the objects and see where it thinks they map 
to. And/or maybe pg query might show something?

Nick


________________________________

[cid:1490900827.2469.72.camel@storagecraft.com]<https://storagecraft.com>       
Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

________________________________

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________

________________________________

[cid:imagef0e9d2.JPG@294fd64f.4893a633]<https://storagecraft.com>       Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

________________________________

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steve 
Taylor
Sent: 30 March 2017 16:24
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Question about unfound objects

We've had a couple of puzzling experiences recently with unfound
objects, and I wonder if anyone can shed some light.

This happened with Hammer 0.94.7 on a cluster with 1,309 OSDs. Our use
case is exclusively RBD in this cluster, so it's naturally replicated.
The rbd pool size is 3, min_size is 2. The crush map is flat, so each
host is a failure domain. The OSD hosts are 4U Supermicro chassis with
32 OSDs each. Drive failures have caused the OSD count to be 1,309
instead of 1,312.

Twice in the last few weeks we've experienced issues where the cluster
was HEALTH_OK but was frequently getting some blocked requests. In each
of the two occurrences we investigated and discovered that the blocked
requests resulted from two drives in the same host that were
misbehaving (different set of 2 drives in each occurrence). We decided
to remove the misbehaving OSDs and let things backfill to see if that
would address the issue. Removing the drives resulted in a small number
of unfound objects, which was surprising. We were able to add the OSDs
back with 0 weight and recover the unfound objects in both cases, but
removing two OSDs from a single failure domain shouldn't have resulted
in unfound objects in an otherwise healthy cluster, correct?
________________________________
[cid:1490900827.2469.72.camel@storagecraft.com]<http://xo4t.mj.am/lnk/ADsAAHBLtsEAAAAAAAAAAF3gdq4AADNJBWwAAAAAAACRXwBY3TNPpyGQvpKrR9qnYfGowzXSBwAAlBI/1/9Putges-Yax4GeLa0aybAg/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>

Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<http://xo4t.mj.am/lnk/ADsAAHBLtsEAAAAAAAAAAF3gdq4AADNJBWwAAAAAAACRXwBY3TNPpyGQvpKrR9qnYfGowzXSBwAAlBI/2/lNnL1zN8n-s4QUTUQYSKow/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

________________________________
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________________________________



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to