Re: [ceph-users] Luminous cluster in very bad state need some assistance.

Philippe Van Hecke Sun, 03 Feb 2019 21:52:07 -0800


________________________________________
From: Sage Weil <s...@newdream.net>
Sent: 03 February 2019 18:25
To: Philippe Van Hecke
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Luminous cluster in very bad state need some 
assistance.

On Sun, 3 Feb 2019, Philippe Van Hecke wrote:
> Hello,
> I'am working for BELNET the Belgian Natioanal Research Network
>
> We currently a manage a luminous ceph cluster on ubuntu 16.04
> with 144 hdd osd spread across two data centers with 6 osd nodes
> on each datacenter. Osd(s) are 4 TB sata disk.
>
> Last week we had a network incident and the link between our 2 DC
> begin to flap due top spt flap. This let our ceph
> cluster in a very bad state with many pg stuck in different state.
> I let the cluster the time to recover , but some osd doesn't restart.
> I have read and try different stuff found in this mailing list but
> this had the effect to be in worst situation because all my osds began to 
> falling down one  due to some bad pg.
>
> I then try the solution describ by our grec coleagues
> https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-hard-way/
>
> So i put a set noout and noscrub nodeep-scrub to osd that seem to freeze the 
> situation.
>
> The cluster is only used to provide rbd disk to our cloud-compute and 
> cloud-storage solution
> and to our internal kvm vm
>
> It seem that only some pool are affected by unclean/unknown/unfound object
>
> And all is working well for other pool ( may be some speed issue )
>
> I can confirm that data on affected pool are completly corrupted.
>
> You can find here 
> https://filesender.belnet.be/?s=download&token=1fac6b04-dd35-46f7-b4a8-c851cfa06379
> a tgz file with a maximum information i can dump to give an overview
> of the current state of the cluster.
>
> So i have 2 questions.
>
> Does removing affected pools w with stuck pg associated will remove the 
> deffect pg ?

Yes, but don't do that yet!  From a quick look this looks like it can be
worked around.

First question is why you're hitting the assert on e.g. osd.49

     0> 2019-02-01 09:23:36.963503 7fb548859e00 -1
/build/ceph-12.2.5/src/osd/PGLog.h: In function 'static void
PGLog::read_log_and_missing(ObjectStore*, coll_t, coll_t, ghobject_t,
const pg_info_t&, PGLog::IndexedLog&, missing_type&, bool,
std::ostringstream&, bool, bool*, const DoutPrefixProvider*,
std::set<std::__cxx11::basic_string<char> >*, bool) [with missing_type =
pg_missing_set<true>; std::ostringstream =
std::__cxx11::basic_ostringstream<char>]' thread 7fb548859e00 time
2019-02-01 09:23:36.961237
/build/ceph-12.2.5/src/osd/PGLog.h: 1354: FAILED
assert(last_e.version.version < e.version.version)

If you can set debug osd = 20 on that osd, start it, and ceph-post-file
the log, that would be helpful.  12.2.5 is a pretty old luminous release,
but I don't see this in the tracker, so a log would be great.

Your priority is probably to get the pools active, though.  For osd.49,
the problematic pg is 11.182, which your pg ls output shows as online and
undersized but usable.  You can use ceph-objectstore-tool --op
export-remove to make a backup and remove it from the osd.49 and then that
osd will likely start up.

If you look at 11.ac, your only incomplete pg in pool 11, the
query says

            "down_osds_we_would_probe": [
                49,
                63
            ],

..so if you get that OSD up that PG should peer.

In pool 12, you have 12.14d

            "down_osds_we_would_probe": [
                9,
                51
            ],

osd.51 won't start due to the same assert but on pg 15.246, and hte pg ls
shows that pg is undersized but active, so doing the same --op
export-remove on that osd will hopefully let it start.  I'm guessing the
same will work on the other 12.* pg, but see if it works on 11.182 first
so that pool will be completely up and available.

Let us know how it goes!

sage

Hi Sage, First of all tanks for your help

Please find here  
https://filesender.belnet.be/?s=download&token=dea0edda-5b6a-4284-9ea1-c1fdf88b65e9
the osd log with debug info for osd.49. and indeed if all buggy osd can restart 
that can may be solve the issue.
But i also happy that you confirm my understanding that in the worst case 
removing pool can also resolve the problem even in this case i lose data  but 
finish with a working cluster.

Kr
Philippe

PS: don't know and don't want to open debat about top/bottom posting but would 
like to know the preference of this list :-)
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous cluster in very bad state need some assistance.

Reply via email to