Hi Folks,

We had to delete some unfound objects in our cache to get our cluster
back working! but after an hour we see OSD's crash

we found that it is caused by the fact that we deleted the:
 "hit_set_8.3fc_archive_2021-09-09 08:25:58.520768Z_2021-09-09
08:26:18.907234Z" Object

Crash-Log can be found here https://paste.openstack.org/show/809211/

our plan is now to change the osd code to not update the stats in
order to get the osd back online and remove the cache layer

diff --git a/src/osd/PrimaryLogPG.cc b/src/osd/PrimaryLogPG.cc
index 3b3e3e59292..a06fec9c269 100644
--- a/src/osd/PrimaryLogPG.cc
+++ b/src/osd/PrimaryLogPG.cc
@@ -13932,11 +13932,13 @@ void
PrimaryLogPG::hit_set_trim(OpContextUPtr &ctx, unsigned max)
     updated_hit_set_hist.history.pop_front();

     ObjectContextRef obc = get_object_context(oid, false);
-    ceph_assert(obc);
+    //ceph_assert(obc);
+    if (obc) {
     --ctx->delta_stats.num_objects;
     --ctx->delta_stats.num_objects_hit_set_archive;
     ctx->delta_stats.num_bytes -= obc->obs.oi.size;
     ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size;
+    }
   }
 }

Does anyone have done this before or have another workaround to get
the OSD back online

Thanks in Advance
Ansgar
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to