I've moved a whole set of OSD's from one server to another after a filure.
All other OSD's have come up no problems and Ceph is busy rebalancing itself
after the change, but one of my OSD's is being difficult. Below is the output
from the log file when I use 'service ceph restart osd.35' to try and bring it
online.
I can see any usefull errors here to start chasing down, can you?
-32> 2016-04-04 16:07:13.425771 7f28d457f880 2 journal read_entry
4048896000 : seq 283263182 137 bytes
-31> 2016-04-04 16:07:13.425780 7f28d457f880 2 journal read_entry
4048900096 : seq 283263183 137 bytes
-30> 2016-04-04 16:07:13.425791 7f28d457f880 2 journal No further valid
entries found, journal is most likely valid
-29> 2016-04-04 16:07:13.425802 7f28d457f880 2 journal No further valid
entries found, journal is most likely valid
-28> 2016-04-04 16:07:13.425809 7f28d457f880 3 journal journal_replay: end
of journal, done.
-27> 2016-04-04 16:07:13.431977 7f28d457f880 1 journal _open
/var/lib/ceph/osd/ceph-35/journal fd 20: 5367660544 bytes, block size 4096
bytes, directio = 1, aio = 1
-26> 2016-04-04 16:07:13.432317 7f28d457f880 2 osd.35 0 boot
-25> 2016-04-04 16:07:13.462245 7f28d457f880 0 <cls>
cls/hello/cls_hello.cc:271: loading cls_hello
-24> 2016-04-04 16:07:13.463479 7f28d457f880 1 <cls>
cls/log/cls_log.cc:312: Loaded log class!
-23> 2016-04-04 16:07:13.468017 7f28d457f880 1 <cls>
cls/refcount/cls_refcount.cc:231: Loaded refcount class!
-22> 2016-04-04 16:07:13.468228 7f28d457f880 1 <cls>
cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
-21> 2016-04-04 16:07:13.472035 7f28d457f880 1 <cls>
cls/rgw/cls_rgw.cc:3047: Loaded rgw class!
-20> 2016-04-04 16:07:13.472480 7f28d457f880 1 <cls>
cls/statelog/cls_statelog.cc:306: Loaded log class!
-19> 2016-04-04 16:07:13.472890 7f28d457f880 1 <cls>
cls/user/cls_user.cc:367: Loaded user class!
-18> 2016-04-04 16:07:13.473231 7f28d457f880 1 <cls>
cls/version/cls_version.cc:227: Loaded version class!
-17> 2016-04-04 16:07:13.565117 7f28d457f880 0 osd.35 3612 crush map has
features 1107558400, adjusting msgr requires for clients
-16> 2016-04-04 16:07:13.565139 7f28d457f880 0 osd.35 3612 crush map has
features 1107558400 was 8705, adjusting msgr requires for mons
-15> 2016-04-04 16:07:13.565149 7f28d457f880 0 osd.35 3612 crush map has
features 1107558400, adjusting msgr requires for osds
-14> 2016-04-04 16:07:13.565171 7f28d457f880 0 osd.35 3612 load_pgs
-13> 2016-04-04 16:07:36.965045 7f28d457f880 5 osd.35 pg_epoch: 3604
pg[34.2(unlocked)] enter Initial
-12> 2016-04-04 16:07:36.965642 7f28d457f880 5 osd.35 pg_epoch: 3604
pg[34.2( empty local-les=0 n=0 ec=709 les/c 3578/3582 3588/3588/3542) [38,35]
r=1 lpr=0 pi=3542-3587/1 crt=0'0 inactive NOTIFY] exit Initial 0.000598 0
0.000000
-11> 2016-04-04 16:07:36.965691 7f28d457f880 5 osd.35 pg_epoch: 3604
pg[34.2( empty local-les=0 n=0 ec=709 les/c 3578/3582 3588/3588/3542) [38,35]
r=1 lpr=0 pi=3542-3587/1 crt=0'0 inactive NOTIFY] enter Reset
-10> 2016-04-04 16:07:36.987602 7f28d457f880 5 osd.35 pg_epoch: 3587
pg[34.3(unlocked)] enter Initial
-9> 2016-04-04 16:07:37.011658 7f28d457f880 5 osd.35 pg_epoch: 3587
pg[34.3( v 3334'2172 (0'0,3334'2172] local-les=3562 n=0 ec=709 les/c 3562/3587
3516/3516/3459) [35,34] r=0 lpr=0 crt=3334'2170 lcod 0'0 mlcod 0'0 inactive]
exit Initial 0.024056 0 0.000000
-8> 2016-04-04 16:07:37.011696 7f28d457f880 5 osd.35 pg_epoch: 3587
pg[34.3( v 3334'2172 (0'0,3334'2172] local-les=3562 n=0 ec=709 les/c 3562/3587
3516/3516/3459) [35,34] r=0 lpr=0 crt=3334'2170 lcod 0'0 mlcod 0'0 inactive]
enter Reset
-7> 2016-04-04 16:07:37.067070 7f28d457f880 5 osd.35 pg_epoch: 3481
pg[34.7(unlocked)] enter Initial
-6> 2016-04-04 16:07:37.097469 7f28d457f880 5 osd.35 pg_epoch: 3481
pg[34.7( v 3334'2635 (0'0,3334'2635] local-les=3465 n=0 ec=709 les/c 3465/3479
3459/3459/3369) [34,35] r=1 lpr=0 pi=3448-3458/1 crt=3334'2635 lcod 0'0
inactive NOTIFY] exit Initial 0.030398 0 0.000000
-5> 2016-04-04 16:07:37.097531 7f28d457f880 5 osd.35 pg_epoch: 3481
pg[34.7( v 3334'2635 (0'0,3334'2635] local-les=3465 n=0 ec=709 les/c 3465/3479
3459/3459/3369) [34,35] r=1 lpr=0 pi=3448-3458/1 crt=3334'2635 lcod 0'0
inactive NOTIFY] enter Reset
-4> 2016-04-04 16:07:37.146741 7f28d457f880 5 osd.35 pg_epoch: 3582
pg[34.8(unlocked)] enter Initial
-3> 2016-04-04 16:07:37.172346 7f28d457f880 5 osd.35 pg_epoch: 3582
pg[34.8( v 3334'2042 (0'0,3334'2042] lb 0//0//-1 local-les=3494 n=0 ec=709
les/c 3494/3502 3542/3542/3368) [33,38] r=-1 lpr=0 pi=3368-3541/4 crt=3334'2042
lcod 0'0 inactive NOTIFY] exit Initial 0.025604 0 0.000000
-2> 2016-04-04 16:07:37.172387 7f28d457f880 5 osd.35 pg_epoch: 3582
pg[34.8( v 3334'2042 (0'0,3334'2042] lb 0//0//-1 local-les=3494 n=0 ec=709
les/c 3494/3502 3542/3542/3368) [33,38] r=-1 lpr=0 pi=3368-3541/4 crt=3334'2042
lcod 0'0 inactive NOTIFY] enter Reset
-1> 2016-04-04 16:07:37.189150 7f28d457f880 5 osd.35 pg_epoch: 3585
pg[34.9(unlocked)] enter Initial
0> 2016-04-04 16:07:37.268571 7f28d457f880 -1 *** Caught signal (Aborted)
**
in thread 7f28d457f880
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: /usr/bin/ceph-osd() [0xac5c32]
2: (()+0xf130) [0x7f28d2f0d130]
3: (gsignal()+0x37) [0x7f28d19275d7]
4: (abort()+0x148) [0x7f28d1928cc8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f28d222b9b5]
6: (()+0x5e926) [0x7f28d2229926]
7: (()+0x5e953) [0x7f28d2229953]
8: (()+0x5eb73) [0x7f28d2229b73]
9: (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x230)
[0x792510]
10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t
const&, std::map<eversion_t, hobject_t, std::less<eversion_t>,
std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&,
pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>,
std::allocator<char> >&, std::set<std::string, std::less<std::string>,
std::allocator<std::string> >*)+0xa2f) [0x76b9bf]
11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x34f) [0x7edebf]
12: (OSD::load_pgs()+0xa7a) [0x6b6a8a]
13: (OSD::init()+0x729) [0x6b9319]
14: (main()+0x27f3) [0x643ed3]
15: (__libc_start_main()+0xf5) [0x7f28d1913af5]
16: /usr/bin/ceph-osd() [0x65d139]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.35.log
--- end dump of recent events ---
[root@cephosd4 ~]#
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com