Re: [ceph-users] Power outages!!! help!

hjcho616 Thu, 14 Sep 2017 22:54:46 -0700

I just did this and backfilling started.  Let's see where this takes me. ceph 
osd lost 0 --yes-i-really-mean-it
Regards,Hong


    On Friday, September 15, 2017 12:44 AM, hjcho616 <hjcho...@yahoo.com> wrote:
 

 Ronny,
Working with all of the pgs shown in the "ceph health detail", I ran below for 
each PG to export.ceph-objectstore-tool --op export --pgid 0.1c   --data-path 
/var/lib/ceph/osd/ceph-0 --journal-path /var/lib/ceph/osd/ceph-0/journal 
--skip-journal-replay --file 0.1c.export

I have all PGs exported, except 1... PG 1.28.  It is on ceph-4.  This error 
doesn't make much sense to me.  Looking at the source code from 
https://github.com/ceph/ceph/blob/master/src/osd/osd_types.cc, that message is 
telling me struct_v is 1... but not sure how it ended up in the default in the 
case statement when 1 case is defined...  I tried with --skip-journal-replay, 
fails with same error message.ceph-objectstore-tool --op export --pgid 1.28  
--data-path /var/lib/ceph/osd/ceph-4 --journal-path 
/var/lib/ceph/osd/ceph-4/journal --file 1.28.exportterminate called after 
throwing an instance of 'std::domain_error'  what():  coll_t::decode(): don't 
know how to decode version 1*** Caught signal (Aborted) ** in thread 
7fabc5ecc940 thread_name:ceph-objectstor ceph version 10.2.9 
(2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x996a57) [0x55b2d3323a57] 2: 
(()+0x110c0) [0x7fabc46d50c0] 3: (gsignal()+0xcf) [0x7fabc2b08fcf] 4: 
(abort()+0x16a) [0x7fabc2b0a3fa] 5: 
(__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fabc33efb3d] 6: 
(()+0x5ebb6) [0x7fabc33edbb6] 7: (()+0x5ec01) [0x7fabc33edc01] 8: (()+0x5ee19) 
[0x7fabc33ede19] 9: (coll_t::decode(ceph::buffer::list::iterator&)+0x21e) 
[0x55b2d2ff401e] 10: 
(DBObjectMap::_Header::decode(ceph::buffer::list::iterator&)+0x125) 
[0x55b2d31315f5] 11: (DBObjectMap::check(std::ostream&, bool)+0x279) 
[0x55b2d3126bb9] 12: (DBObjectMap::init(bool)+0x288) [0x55b2d3125eb8] 13: 
(FileStore::mount()+0x2525) [0x55b2d305ceb5] 14: (main()+0x28c0) 
[0x55b2d2c8d400] 15: (__libc_start_main()+0xf1) [0x7fabc2af62b1] 16: 
(()+0x34f747) [0x55b2d2cdc747]Aborted
Then wrote a simple script to run import process... just created an OSD per PG. 
 Basically ran below for each PG.mkdir 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/ceph-disk prepare 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/chown -R ceph.ceph 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/ceph-disk activate 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/ceph osd crush reweight osd.$(cat 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/whoami) 0systemctl stop ceph-osd@$(cat 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/whoami)ceph-objectstore-tool --op import 
--pgid 0.1c   --data-path /var/lib/ceph/osd/ceph-$(cat 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/whoami) --journal-path 
/var/lib/ceph/osd/ceph-$(cat 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/whoami)/journal --file 
./export/0.1c.export   chown -R ceph.ceph 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/systemctl start ceph-osd@$(cat 
/var/lib/ceph/osd/ceph-5/tmposd_0.1c/whoami)
Sometimes import didn't work.. but stopping OSD and rerunning 
ceph-objectstore-tool again seems to help or when some PG didn't really want to 
import .
Unfound messages are gone!   But I still have down+peering, or 
down+remapped+peering. # ceph health detailHEALTH_ERR 22 pgs are stuck inactive 
for more than 300 seconds; 22 pgs down; 1 pgs inconsistent; 22 pgs peering; 22 
pgs stuck inactive; 22 pgs stuck unclean; 1 requests are blocked > 32 sec; 1 
osds have slow requests; 2 scrub errors; mds cluster is degraded; noout flag(s) 
set; no legacy OSD present but 'sortbitwise' flag is not setpg 1.d is stuck 
inactive since forever, current state down+peering, last acting [11,2]pg 0.a is 
stuck inactive since forever, current state down+remapped+peering, last acting 
[11,7]pg 2.8 is stuck inactive since forever, current state 
down+remapped+peering, last acting [11,7]pg 2.b is stuck inactive since 
forever, current state down+remapped+peering, last acting [7,11]pg 1.9 is stuck 
inactive since forever, current state down+remapped+peering, last acting 
[11,7]pg 0.e is stuck inactive since forever, current state down+peering, last 
acting [11,2]pg 1.3d is stuck inactive since forever, current state 
down+remapped+peering, last acting [10,6]pg 0.2c is stuck inactive since 
forever, current state down+peering, last acting [1,11]pg 0.0 is stuck inactive 
since forever, current state down+remapped+peering, last acting [10,7]pg 1.2b 
is stuck inactive since forever, current state down+peering, last acting 
[1,11]pg 0.29 is stuck inactive since forever, current state down+peering, last 
acting [11,6]pg 1.28 is stuck inactive since forever, current state 
down+peering, last acting [11,6]pg 2.3 is stuck inactive since forever, current 
state down+peering, last acting [11,7]pg 1.1b is stuck inactive since forever, 
current state down+remapped+peering, last acting [11,6]pg 0.d is stuck inactive 
since forever, current state down+remapped+peering, last acting [7,11]pg 1.c is 
stuck inactive since forever, current state down+remapped+peering, last acting 
[7,11]pg 0.3b is stuck inactive since forever, current state 
down+remapped+peering, last acting [10,7]pg 2.39 is stuck inactive since 
forever, current state down+remapped+peering, last acting [10,7]pg 1.3a is 
stuck inactive since forever, current state down+remapped+peering, last acting 
[10,7]pg 0.5 is stuck inactive since forever, current state down+peering, last 
acting [11,7]pg 1.4 is stuck inactive since forever, current state 
down+peering, last acting [11,7]pg 0.1c is stuck inactive since forever, 
current state down+peering, last acting [11,6]pg 1.d is stuck unclean since 
forever, current state down+peering, last acting [11,2]pg 0.a is stuck unclean 
since forever, current state down+remapped+peering, last acting [11,7]pg 2.8 is 
stuck unclean since forever, current state down+remapped+peering, last acting 
[11,7]pg 2.b is stuck unclean since forever, current state 
down+remapped+peering, last acting [7,11]pg 1.9 is stuck unclean since forever, 
current state down+remapped+peering, last acting [11,7]pg 0.e is stuck unclean 
since forever, current state down+peering, last acting [11,2]pg 1.3d is stuck 
unclean since forever, current state down+remapped+peering, last acting 
[10,6]pg 0.d is stuck unclean since forever, current state 
down+remapped+peering, last acting [7,11]pg 1.c is stuck unclean since forever, 
current state down+remapped+peering, last acting [7,11]pg 0.3b is stuck unclean 
since forever, current state down+remapped+peering, last acting [10,7]pg 1.3a 
is stuck unclean since forever, current state down+remapped+peering, last 
acting [10,7]pg 2.39 is stuck unclean since forever, current state 
down+remapped+peering, last acting [10,7]pg 0.5 is stuck unclean since forever, 
current state down+peering, last acting [11,7]pg 1.4 is stuck unclean since 
forever, current state down+peering, last acting [11,7]pg 0.1c is stuck unclean 
since forever, current state down+peering, last acting [11,6]pg 1.1b is stuck 
unclean since forever, current state down+remapped+peering, last acting 
[11,6]pg 2.3 is stuck unclean since forever, current state down+peering, last 
acting [11,7]pg 0.0 is stuck unclean since forever, current state 
down+remapped+peering, last acting [10,7]pg 1.28 is stuck unclean since 
forever, current state down+peering, last acting [11,6]pg 0.29 is stuck unclean 
since forever, current state down+peering, last acting [11,6]pg 1.2b is stuck 
unclean since forever, current state down+peering, last acting [1,11]pg 0.2c is 
stuck unclean since forever, current state down+peering, last acting [1,11]pg 
0.2c is down+peering, acting [1,11]pg 1.2b is down+peering, acting [1,11]pg 
0.29 is down+peering, acting [11,6]pg 1.28 is down+peering, acting [11,6]pg 0.0 
is down+remapped+peering, acting [10,7]pg 2.3 is down+peering, acting [11,7]pg 
1.1b is down+remapped+peering, acting [11,6]pg 0.1c is down+peering, acting 
[11,6]pg 2.39 is down+remapped+peering, acting [10,7]pg 1.3a is 
down+remapped+peering, acting [10,7]pg 0.3b is down+remapped+peering, acting 
[10,7]pg 1.3d is down+remapped+peering, acting [10,6]pg 2.7 is 
active+clean+inconsistent, acting [2,11]pg 1.4 is down+peering, acting [11,7]pg 
0.5 is down+peering, acting [11,7]pg 1.9 is down+remapped+peering, acting 
[11,7]pg 2.b is down+remapped+peering, acting [7,11]pg 2.8 is 
down+remapped+peering, acting [11,7]pg 0.a is down+remapped+peering, acting 
[11,7]pg 1.d is down+peering, acting [11,2]pg 1.c is down+remapped+peering, 
acting [7,11]pg 0.d is down+remapped+peering, acting [7,11]pg 0.e is 
down+peering, acting [11,2]1 ops are blocked > 8388.61 sec on osd.101 osds have 
slow requests2 scrub errorsmds cluster is degradedmds.MDS1.2 at 
192.168.1.20:6801/3142084617 rank 0 is replaying journalnoout flag(s) set
What would be the next step?
Regards,Hong

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Power outages!!! help!

Reply via email to