I had one of my OSDs crash yesterday. I'm using ceph version 0.56.3 
(6eb7e15a4783b122e9b0c85ea9ba064145958aa5). 

The part of the log file where the crash happened is attached. Not really sure 
what lead up to it, but I did get an alert from my server monitor telling me my 
swap space got really low around the time it crashed. 

The OSD reconnected after restarting the service. Currently, I'm waiting 
patiently as 1 of my 400 pgs gets out of active+clean+scrubbing status. 

Dave Spano 
Optogenics 
Systems Administrator 


   -17> 2013-03-03 13:02:13.478152 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.13039.0:6860359, seq: 5393222, time: 2013-03-03 13:02:13.478134, event: write_thread_in_journal_buffer, request: osd_sub_op(client.13039.0:6860359 3.0 a10c17c8/rb.0.2dd7.16d28c4f.00000000002f/head//3 [] v 411'1980074 snapset=0=[]:[] snapc=0=[]) v7
   -16> 2013-03-03 13:02:13.478153 7f5d559ab700  1 -- 192.168.3.11:6801/4500 --> osd.1 192.168.3.12:6802/2467 -- osd_sub_op_reply(client.14000.1:570700 0.16 5e01a96/100003797f2.00000000/head//0 [] ondisk, result = 0) v1 -- ?+0 0xc45cc80
   -15> 2013-03-03 13:02:13.478184 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.14000.1:570701, seq: 5393223, time: 2013-03-03 13:02:13.478184, event: write_thread_in_journal_buffer, request: osd_sub_op(client.14000.1:570701 0.22 40dccca2/100001164ca.00000002/head//0 [] v 411'447369 snapset=0=[]:[] snapc=0=[]) v7
   -14> 2013-03-03 13:02:13.478209 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625658, seq: 5393225, time: 2013-03-03 13:02:13.478209, event: write_thread_in_journal_buffer, request: osd_sub_op(client.11755.0:2625658 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095529 snapset=0=[]:[] snapc=0=[]) v7
   -13> 2013-03-03 13:02:13.478234 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625659, seq: 5393226, time: 2013-03-03 13:02:13.478234, event: write_thread_in_journal_buffer, request: osd_sub_op(client.11755.0:2625659 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095530 snapset=0=[]:[] snapc=0=[]) v7
   -12> 2013-03-03 13:02:13.484696 7f5d549a9700  1 -- 192.168.3.11:6800/4500 <== client.11755 192.168.1.64:0/1062411 90128 ==== ping v1 ==== 0+0+0 (0 0 0) 0xff4e000 con 0x307a6e0
   -11> 2013-03-03 13:02:13.489457 7f5d4f99f700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.489457, event: started, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
   -10> 2013-03-03 13:02:13.489503 7f5d4f99f700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.489503, event: commit_queued_for_journal_write, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
    -9> 2013-03-03 13:02:13.571632 7f5d501a0700  5 --OSD::tracker-- reqid: client.11755.0:2625657, seq: 5393224, time: 2013-03-03 13:02:13.571631, event: started, request: osd_op(client.11755.0:2625657 rb.0.2ea4.614c277f.00000000003d [write 1253376~4096] 3.c7bd6ff1) v4
    -8> 2013-03-03 13:02:13.571661 7f5d501a0700  5 --OSD::tracker-- reqid: client.11755.0:2625657, seq: 5393224, time: 2013-03-03 13:02:13.571661, event: started, request: osd_op(client.11755.0:2625657 rb.0.2ea4.614c277f.00000000003d [write 1253376~4096] 3.c7bd6ff1) v4
    -7> 2013-03-03 13:02:13.571733 7f5d501a0700  5 --OSD::tracker-- reqid: client.11755.0:2625657, seq: 5393224, time: 2013-03-03 13:02:13.571733, event: waiting for subops from [1], request: osd_op(client.11755.0:2625657 rb.0.2ea4.614c277f.00000000003d [write 1253376~4096] 3.c7bd6ff1) v4
    -6> 2013-03-03 13:02:13.598028 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.13039.0:6860359, seq: 5393222, time: 2013-03-03 13:02:13.598027, event: journaled_completion_queued, request: osd_sub_op(client.13039.0:6860359 3.0 a10c17c8/rb.0.2dd7.16d28c4f.00000000002f/head//3 [] v 411'1980074 snapset=0=[]:[] snapc=0=[]) v7
    -5> 2013-03-03 13:02:13.598061 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.14000.1:570701, seq: 5393223, time: 2013-03-03 13:02:13.598061, event: journaled_completion_queued, request: osd_sub_op(client.14000.1:570701 0.22 40dccca2/100001164ca.00000002/head//0 [] v 411'447369 snapset=0=[]:[] snapc=0=[]) v7
    -4> 2013-03-03 13:02:13.598081 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625658, seq: 5393225, time: 2013-03-03 13:02:13.598081, event: journaled_completion_queued, request: osd_sub_op(client.11755.0:2625658 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095529 snapset=0=[]:[] snapc=0=[]) v7
    -3> 2013-03-03 13:02:13.598098 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625659, seq: 5393226, time: 2013-03-03 13:02:13.598098, event: journaled_completion_queued, request: osd_sub_op(client.11755.0:2625659 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095530 snapset=0=[]:[] snapc=0=[]) v7
    -2> 2013-03-03 13:02:13.598134 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.598134, event: write_thread_in_journal_buffer, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
    -1> 2013-03-03 13:02:13.598257 7f5d5a9b5700  5 --OSD::tracker-- reqid: client.11755.0:2625660, seq: 5393227, time: 2013-03-03 13:02:13.598257, event: journaled_completion_queued, request: osd_sub_op(client.11755.0:2625660 3.7 2cb006a7/rb.0.2ea4.614c277f.00000000103d/head//3 [] v 411'6095531 snapset=0=[]:[] snapc=0=[]) v7
     0> 2013-03-03 13:02:13.753064 7f5d4c097700 -1 *** Caught signal (Aborted) **
 in thread 7f5d4c097700

 ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
 1: /usr/bin/ceph-osd() [0x78430a]
 2: (()+0xfcb0) [0x7f5d60fc3cb0]
 3: (gsignal()+0x35) [0x7f5d5f982425]
 4: (abort()+0x17b) [0x7f5d5f985b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f5d602d469d]
 6: (()+0xb5846) [0x7f5d602d2846]
 7: (()+0xb5873) [0x7f5d602d2873]
 8: (()+0xb596e) [0x7f5d602d296e]
 9: (ceph::buffer::create_page_aligned(unsigned int)+0x95) [0x82ef25]
 10: (Pipe::read_message(Message**)+0x2421) [0x8d3591]
 11: (Pipe::reader()+0x8c2) [0x8e3db2]
 12: (Pipe::Reader::entry()+0xd) [0x8e668d]
 13: (()+0x7e9a) [0x7f5d60fbbe9a]
 14: (clone()+0x6d) [0x7f5d5fa3fcbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent    100000
  max_new         1000
  log_file /var/log/ceph/osd.0.log
--- end dump of recent events ---
root@ha1:/var/log/ceph# 

Reply via email to