Hi, several osds were down/out with similar logs as below, could you help?
-38> 2014-06-05 10:27:54.700832 7f2ceead6700 1 -- 192.168.40.11:6800/19542
<== osd.11 192.168.40.11:6822/20298 2 ==== pg_notify(0.aa4(2) epoch 7) v5 ====
812+0+0 (3873498789 0 0) 0x57a0540 con 0x49d14a0
-37> 2014-06-05 10:27:54.700874 7f2ceead6700 5 -- op tracker -- , seq:
2028, time: 2014-06-05 10:27:54.396463, event: header_read, request:
pg_notify(0.aa4(2) epoch 7) v5
-36> 2014-06-05 10:27:54.701091 7f2ceead6700 5 -- op tracker -- , seq:
2028, time: 2014-06-05 10:27:54.396465, event: throttled, request:
pg_notify(0.aa4(2) epoch 7) v5
-35> 2014-06-05 10:27:54.701126 7f2ceead6700 5 -- op tracker -- , seq:
2028, time: 2014-06-05 10:27:54.396543, event: all_read, request:
pg_notify(0.aa4(2) epoch 7) v5
-34> 2014-06-05 10:27:54.701187 7f2ceead6700 5 -- op tracker -- , seq:
2028, time: 2014-06-05 10:27:54.700870, event: dispatched, request:
pg_notify(0.aa4(2) epoch 7) v5
-33> 2014-06-05 10:27:54.701235 7f2ceead6700 5 -- op tracker -- , seq:
2028, time: 2014-06-05 10:27:54.701234, event: waiting_for_osdmap, request:
pg_notify(0.aa4(2) epoch 7) v5
-32> 2014-06-05 10:27:54.701267 7f2ceead6700 5 -- op tracker -- , seq:
2028, time: 2014-06-05 10:27:54.701267, event: started, request:
pg_notify(0.aa4(2) epoch 7) v5
-31> 2014-06-05 10:27:54.700940 7f2ce92cb700 5 osd.0 pg_epoch: 7 pg[0.1dc1(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [69,56,21] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] exit Start 0.003321 0 0.000000
-30> 2014-06-05 10:27:54.702163 7f2ceead6700 5 -- op tracker -- , seq:
2028, time: 2014-06-05 10:27:54.702163, event: done, request:
pg_notify(0.aa4(2) epoch 7) v5
-29> 2014-06-05 10:27:54.701690 7f2ce92cb700 5 osd.0 pg_epoch: 7 pg[0.1dc1(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [69,56,21] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] enter Started/Stray
-28> 2014-06-05 10:27:54.697805 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod
0'0 active] exit Started/Primary/Active 7.527967 0 0.000000
-27> 2014-06-05 10:27:54.702437 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod
0'0 active] exit Started/Primary 8.986091 0 0.000000
-26> 2014-06-05 10:27:54.702467 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod
0'0 active] exit Started 8.986256 0 0.000000
-25> 2014-06-05 10:27:54.702486 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 3/3/3) [0,100,64] r=0 lpr=3 crt=0'0 mlcod
0'0 active] enter Reset
-24> 2014-06-05 10:27:54.702822 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] exit Reset 0.000336 1 0.005574
-23> 2014-06-05 10:27:54.702900 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] enter Started
-22> 2014-06-05 10:27:54.703025 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] enter Start
-21> 2014-06-05 10:27:54.703235 7f2ce8aca700 1 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] state<Start>: transitioning to Stray
-20> 2014-06-05 10:27:54.704919 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] exit Start 0.001894 0 0.000000
-19> 2014-06-05 10:27:54.704835 7f2ceead6700 1 -- 192.168.40.11:6800/19542
<== osd.15 192.168.40.11:6830/20581 1 ==== pg_notify(2.60(2) epoch 7) v5 ====
812+0+0 (3557830937 0 0) 0x4a6f540 con 0x3e3bb20
-18> 2014-06-05 10:27:54.705050 7f2ceead6700 5 -- op tracker -- , seq:
2029, time: 2014-06-05 10:27:54.598458, event: header_read, request:
pg_notify(2.60(2) epoch 7) v5
-17> 2014-06-05 10:27:54.705079 7f2ceead6700 5 -- op tracker -- , seq:
2029, time: 2014-06-05 10:27:54.598461, event: throttled, request:
pg_notify(2.60(2) epoch 7) v5
-16> 2014-06-05 10:27:54.705124 7f2ce8aca700 5 osd.0 pg_epoch: 7 pg[1.98(
empty local-les=4 n=0 ec=1 les/c 4/4 7/7/7) [54,95,68] r=-1 lpr=7 pi=3-6/1
crt=0'0 inactive NOTIFY] enter Started/Stray
-15> 2014-06-05 10:27:54.705353 7f2ceead6700 5 -- op tracker -- , seq:
2029, time: 2014-06-05 10:27:54.598689, event: all_read, request:
pg_notify(2.60(2) epoch 7) v5
-14> 2014-06-05 10:27:54.706925 7f2ceead6700 5 -- op tracker -- , seq:
2029, time: 2014-06-05 10:27:54.705046, event: dispatched, request:
pg_notify(2.60(2) epoch 7) v5
-13> 2014-06-05 10:27:54.707038 7f2ceead6700 5 -- op tracker -- , seq:
2029, time: 2014-06-05 10:27:54.707037, event: waiting_for_osdmap, request:
pg_notify(2.60(2) epoch 7) v5
-12> 2014-06-05 10:27:54.707075 7f2ceead6700 5 -- op tracker -- , seq:
2029, time: 2014-06-05 10:27:54.707075, event: started, request:
pg_notify(2.60(2) epoch 7) v5
-11> 2014-06-05 10:27:54.708042 7f2ceead6700 5 osd.0 pg_epoch: 7
pg[2.60(unlocked)] enter Initial
-10> 2014-06-05 10:27:54.709885 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=0 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] exit Initial 0.001842 0 0.000000
-9> 2014-06-05 10:27:54.711556 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=0 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] enter Reset
-8> 2014-06-05 10:27:54.711606 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] exit Reset 0.000051 1 0.001718
-7> 2014-06-05 10:27:54.711648 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] enter Started
-6> 2014-06-05 10:27:54.711714 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] enter Start
-5> 2014-06-05 10:27:54.711781 7f2ceead6700 1 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] state<Start>: transitioning to Primary
-4> 2014-06-05 10:27:54.713010 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] exit Start 0.001295 0 0.000000
-3> 2014-06-05 10:27:54.713164 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] enter Started/Primary
-2> 2014-06-05 10:27:54.713234 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 inactive] enter Started/Primary/Peering
-1> 2014-06-05 10:27:54.713319 7f2ceead6700 5 osd.0 pg_epoch: 7 pg[2.60(
empty local-les=0 n=0 ec=1 les/c 4/5 7/7/7) [0,117,49] r=0 lpr=7 pi=1-6/2
crt=0'0 mlcod 0'0 peering] enter Started/Primary/Peering/GetInfo
0> 2014-06-05 10:27:54.711223 7f2ce92cb700 -1 common/Thread.cc: In
function 'void Thread::create(size_t)' thread 7f2ce92cb700 time 2014-06-05
10:27:54.703693
common/Thread.cc: 110: FAILED assert(ret == 0)
ceph version 0.80 (b78644e7dee100e48dfeca32c9270a6b210d3003)
1: (Thread::create(unsigned long)+0x8a) [0xa82dea]
2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*,
Message*)+0x17b) [0xa2b01b]
3: (SimpleMessenger::get_connection(entity_inst_t const&)+0x180) [0xa2f6c0]
4: (OSDService::get_con_osd_cluster(int, unsigned int)+0x1d9) [0x6061c9]
5: (OSD::compat_must_dispatch_immediately(PG*)+0x296) [0x6064a6]
6: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&,
ThreadPool::TPHandle&)+0x4a6) [0x6475d6]
7: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&,
ThreadPool::TPHandle&)+0x16) [0x69dbf6]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0xab4581]
9: (ThreadPool::WorkThread::entry()+0x10) [0xab75c0]
10: /lib64/libpthread.so.0() [0x3eebc079d1]
11: (clone()+0x6d) [0x3eeb4e8b6d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.0.log
--- end dump of recent events ---
2014-06-05 10:27:55.096296 7f2ca8628700 0 -- 192.168.40.11:6800/19542 >>
192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0
c=0x4a0c780).fault with nothing to send, going to standby
2014-06-05 10:27:55.129281 7f2ce92cb700 -1 *** Caught signal (Aborted) **
in thread 7f2ce92cb700
ceph version 0.80 (b78644e7dee100e48dfeca32c9270a6b210d3003)
1: /usr/bin/ceph-osd() [0x9aa211]
2: /lib64/libpthread.so.0() [0x3eebc0f710]
3: (gsignal()+0x35) [0x3eeb432925]
4: (abort()+0x175) [0x3eeb434105]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3eed8bea5d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- begin dump of recent events ---
-6> 2014-06-05 10:27:54.716422 7f2cec2d1700 1 -- 192.168.40.11:6801/19542
<== osd.119 192.168.40.14:0/29977 3 ==== osd_ping(ping e7 stamp 2014-06-05
10:27:54.713281) v2 ==== 47+0+0 (243568413 0 0) 0x4a68380 con 0x3e3c200
-5> 2014-06-05 10:27:54.716480 7f2cec2d1700 1 -- 192.168.40.11:6801/19542
--> 192.168.40.14:0/29977 -- osd_ping(ping_reply e7 stamp 2014-06-05
10:27:54.713281) v2 -- ?+0 0x5436740 con 0x3e3c200
-4> 2014-06-05 10:27:54.717351 7f2ced2d3700 1 -- 192.168.50.11:6801/19542
<== osd.119 192.168.40.14:0/29977 3 ==== osd_ping(ping e7 stamp 2014-06-05
10:27:54.713281) v2 ==== 47+0+0 (243568413 0 0) 0x4a68fc0 con 0x3e3c8e0
-3> 2014-06-05 10:27:55.096211 7f2ca8628700 2 -- 192.168.40.11:6800/19542
>> 192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0
c=0x4a0c780).reader couldn't read tag, (0) Success
-2> 2014-06-05 10:27:55.096258 7f2ca8628700 2 -- 192.168.40.11:6800/19542
>> 192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0
c=0x4a0c780).fault (0) Success
-1> 2014-06-05 10:27:55.096296 7f2ca8628700 0 -- 192.168.40.11:6800/19542
>> 192.168.40.11:6808/19811 pipe(0x4adc100 sd=629 :6800 s=2 pgs=111 cs=1 l=0
c=0x4a0c780).fault with nothing to send, going to standby
0> 2014-06-05 10:27:55.129281 7f2ce92cb700 -1 *** Caught signal (Aborted)
**
in thread 7f2ce92cb700
ceph version 0.80 (b78644e7dee100e48dfeca32c9270a6b210d3003)
1: /usr/bin/ceph-osd() [0x9aa211]
2: /lib64/libpthread.so.0() [0x3eebc0f710]
3: (gsignal()+0x35) [0x3eeb432925]
4: (abort()+0x175) [0x3eeb434105]
5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x3eed8bea5d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 keyvaluestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.0.log
--- end dump of recent events ---
Wei Cao (Buddy)
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com