Ok, we seem to have traced the issue down to this

"    -4> 2016-04-05 00:59:37.790223 7fbde47a0880 20 read_log 1270'2332 
(1270'911) delete   50686c09/rb.0.6f57.2506ccfd.00000000bb2a/head//34 by 
client.95981.0:47944 2015-10-13 16:47:23.620857
    -3> 2016-04-05 00:59:37.790236 7fbde47a0880 20 read_log 1270'2333 
(1270'917) delete   7c7a1409/rb.0.6f57.2506ccfd.00000000bcc6/head//34 by 
client.95981.0:48356 2015-10-13 16:47:24.523262
    -2> 2016-04-05 00:59:37.790256 7fbde47a0880 20 read_log 1270'2334 
(1270'983) delete   483dbe09/rb.0.6f57.2506ccfd.00000000c9e9/head//34 by 
client.95981.0:51720 2015-10-13 16:47:31.584863
    -1> 2016-04-05 00:59:37.790269 7fbde47a0880 20 read_log 1270'2335 
(1270'985) delete   85411809/rb.0.6f57.2506ccfd.00000000cd66/head//34 by 
client.95981.0:52614 2015-10-13 16:47:33.278702
     0> 2016-04-05 00:59:37.793431 7fbde47a0880 -1 *** Caught signal (Aborted) 
**
in thread 7fbde47a0880

ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: /usr/bin/ceph-osd() [0xac5c32]
2: (()+0xf130) [0x7fbde312e130]
3: (gsignal()+0x37) [0x7fbde1b485d7]
4: (abort()+0x148) [0x7fbde1b49cc8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) ["

Here is a link to the full log of the OSD starting with debugging enabled
https://www.dropbox.com/s/i1x2vqcgp7qkr96/ceph-osd.35-3.log?dl=0


From: ceph-users [mailto:[email protected]] On Behalf Of Cory 
Hawkless
Sent: Monday, 4 April 2016 6:27 PM
To: [email protected]
Subject: [ceph-users] OSD not coming up

I've moved a whole set of OSD's from one server to another after a filure.
All other OSD's have come up no problems and Ceph is busy rebalancing itself 
after the change, but one of my OSD's is being difficult. Below is the output 
from the log file when I use 'service ceph restart osd.35' to try and bring it 
online.
I can see any usefull errors here to start chasing down, can you?


   -32> 2016-04-04 16:07:13.425771 7f28d457f880  2 journal read_entry 
4048896000 : seq 283263182 137 bytes
   -31> 2016-04-04 16:07:13.425780 7f28d457f880  2 journal read_entry 
4048900096 : seq 283263183 137 bytes
   -30> 2016-04-04 16:07:13.425791 7f28d457f880  2 journal No further valid 
entries found, journal is most likely valid
   -29> 2016-04-04 16:07:13.425802 7f28d457f880  2 journal No further valid 
entries found, journal is most likely valid
   -28> 2016-04-04 16:07:13.425809 7f28d457f880  3 journal journal_replay: end 
of journal, done.
   -27> 2016-04-04 16:07:13.431977 7f28d457f880  1 journal _open 
/var/lib/ceph/osd/ceph-35/journal fd 20: 5367660544 bytes, block size 4096 
bytes, directio = 1, aio = 1
   -26> 2016-04-04 16:07:13.432317 7f28d457f880  2 osd.35 0 boot
   -25> 2016-04-04 16:07:13.462245 7f28d457f880  0 <cls> 
cls/hello/cls_hello.cc:271: loading cls_hello
   -24> 2016-04-04 16:07:13.463479 7f28d457f880  1 <cls> 
cls/log/cls_log.cc:312: Loaded log class!
   -23> 2016-04-04 16:07:13.468017 7f28d457f880  1 <cls> 
cls/refcount/cls_refcount.cc:231: Loaded refcount class!
   -22> 2016-04-04 16:07:13.468228 7f28d457f880  1 <cls> 
cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
   -21> 2016-04-04 16:07:13.472035 7f28d457f880  1 <cls> 
cls/rgw/cls_rgw.cc:3047: Loaded rgw class!
   -20> 2016-04-04 16:07:13.472480 7f28d457f880  1 <cls> 
cls/statelog/cls_statelog.cc:306: Loaded log class!
   -19> 2016-04-04 16:07:13.472890 7f28d457f880  1 <cls> 
cls/user/cls_user.cc:367: Loaded user class!
   -18> 2016-04-04 16:07:13.473231 7f28d457f880  1 <cls> 
cls/version/cls_version.cc:227: Loaded version class!
   -17> 2016-04-04 16:07:13.565117 7f28d457f880  0 osd.35 3612 crush map has 
features 1107558400, adjusting msgr requires for clients
   -16> 2016-04-04 16:07:13.565139 7f28d457f880  0 osd.35 3612 crush map has 
features 1107558400 was 8705, adjusting msgr requires for mons
   -15> 2016-04-04 16:07:13.565149 7f28d457f880  0 osd.35 3612 crush map has 
features 1107558400, adjusting msgr requires for osds
   -14> 2016-04-04 16:07:13.565171 7f28d457f880  0 osd.35 3612 load_pgs
   -13> 2016-04-04 16:07:36.965045 7f28d457f880  5 osd.35 pg_epoch: 3604 
pg[34.2(unlocked)] enter Initial
   -12> 2016-04-04 16:07:36.965642 7f28d457f880  5 osd.35 pg_epoch: 3604 
pg[34.2( empty local-les=0 n=0 ec=709 les/c 3578/3582 3588/3588/3542) [38,35] 
r=1 lpr=0 pi=3542-3587/1 crt=0'0 inactive NOTIFY] exit Initial 0.000598 0 
0.000000
   -11> 2016-04-04 16:07:36.965691 7f28d457f880  5 osd.35 pg_epoch: 3604 
pg[34.2( empty local-les=0 n=0 ec=709 les/c 3578/3582 3588/3588/3542) [38,35] 
r=1 lpr=0 pi=3542-3587/1 crt=0'0 inactive NOTIFY] enter Reset
   -10> 2016-04-04 16:07:36.987602 7f28d457f880  5 osd.35 pg_epoch: 3587 
pg[34.3(unlocked)] enter Initial
    -9> 2016-04-04 16:07:37.011658 7f28d457f880  5 osd.35 pg_epoch: 3587 
pg[34.3( v 3334'2172 (0'0,3334'2172] local-les=3562 n=0 ec=709 les/c 3562/3587 
3516/3516/3459) [35,34] r=0 lpr=0 crt=3334'2170 lcod 0'0 mlcod 0'0 inactive] 
exit Initial 0.024056 0 0.000000
    -8> 2016-04-04 16:07:37.011696 7f28d457f880  5 osd.35 pg_epoch: 3587 
pg[34.3( v 3334'2172 (0'0,3334'2172] local-les=3562 n=0 ec=709 les/c 3562/3587 
3516/3516/3459) [35,34] r=0 lpr=0 crt=3334'2170 lcod 0'0 mlcod 0'0 inactive] 
enter Reset
    -7> 2016-04-04 16:07:37.067070 7f28d457f880  5 osd.35 pg_epoch: 3481 
pg[34.7(unlocked)] enter Initial
    -6> 2016-04-04 16:07:37.097469 7f28d457f880  5 osd.35 pg_epoch: 3481 
pg[34.7( v 3334'2635 (0'0,3334'2635] local-les=3465 n=0 ec=709 les/c 3465/3479 
3459/3459/3369) [34,35] r=1 lpr=0 pi=3448-3458/1 crt=3334'2635 lcod 0'0 
inactive NOTIFY] exit Initial 0.030398 0 0.000000
    -5> 2016-04-04 16:07:37.097531 7f28d457f880  5 osd.35 pg_epoch: 3481 
pg[34.7( v 3334'2635 (0'0,3334'2635] local-les=3465 n=0 ec=709 les/c 3465/3479 
3459/3459/3369) [34,35] r=1 lpr=0 pi=3448-3458/1 crt=3334'2635 lcod 0'0 
inactive NOTIFY] enter Reset
    -4> 2016-04-04 16:07:37.146741 7f28d457f880  5 osd.35 pg_epoch: 3582 
pg[34.8(unlocked)] enter Initial
    -3> 2016-04-04 16:07:37.172346 7f28d457f880  5 osd.35 pg_epoch: 3582 
pg[34.8( v 3334'2042 (0'0,3334'2042] lb 0//0//-1 local-les=3494 n=0 ec=709 
les/c 3494/3502 3542/3542/3368) [33,38] r=-1 lpr=0 pi=3368-3541/4 crt=3334'2042 
lcod 0'0 inactive NOTIFY] exit Initial 0.025604 0 0.000000
    -2> 2016-04-04 16:07:37.172387 7f28d457f880  5 osd.35 pg_epoch: 3582 
pg[34.8( v 3334'2042 (0'0,3334'2042] lb 0//0//-1 local-les=3494 n=0 ec=709 
les/c 3494/3502 3542/3542/3368) [33,38] r=-1 lpr=0 pi=3368-3541/4 crt=3334'2042 
lcod 0'0 inactive NOTIFY] enter Reset
    -1> 2016-04-04 16:07:37.189150 7f28d457f880  5 osd.35 pg_epoch: 3585 
pg[34.9(unlocked)] enter Initial
     0> 2016-04-04 16:07:37.268571 7f28d457f880 -1 *** Caught signal (Aborted) 
**
in thread 7f28d457f880

ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
1: /usr/bin/ceph-osd() [0xac5c32]
2: (()+0xf130) [0x7f28d2f0d130]
3: (gsignal()+0x37) [0x7f28d19275d7]
4: (abort()+0x148) [0x7f28d1928cc8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f28d222b9b5]
6: (()+0x5e926) [0x7f28d2229926]
7: (()+0x5e953) [0x7f28d2229953]
8: (()+0x5eb73) [0x7f28d2229b73]
9: (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x230) 
[0x792510]
10: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t 
const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, 
std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, 
pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>, 
std::allocator<char> >&, std::set<std::string, std::less<std::string>, 
std::allocator<std::string> >*)+0xa2f) [0x76b9bf]
11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x34f) [0x7edebf]
12: (OSD::load_pgs()+0xa7a) [0x6b6a8a]
13: (OSD::init()+0x729) [0x6b9319]
14: (main()+0x27f3) [0x643ed3]
15: (__libc_start_main()+0xf5) [0x7f28d1913af5]
16: /usr/bin/ceph-osd() [0x65d139]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.35.log
--- end dump of recent events ---
[root@cephosd4 ~]#

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to