hi gregory,

(i'm a colleague of kenneth)

1) How big and what shape the filesystem is. Do you have some
extremely large directory that the MDS keeps trying to load and then
dump?
anyway to extract this from the mds without having to start it? as it was an rsync operation, i can try to locate possible candidates on the source filesystem, but what would be considered "large"?

2) Use tcmalloc's heap analyzer to see where all the memory is being allocated.
we'll giv ethat a try

3) Look through the logs for when the beacon fails (the first of
"mds.0.16 is_laggy 600.641332 > 15 since last acked beacon") and see
if there's anything tell-tale going on at the time.

anything in particular we should be looking for?

the log goes as follows:
mds starts around 11:43
...

2014-04-01 11:44:23.658583 7ffec89c6700  1 mds.0.server reconnect_clients -- 1 
sessions
2014-04-01 11:44:41.212488 7ffec89c6700  0 log [DBG] : reconnect by client.4585 
10.141.8.199:0/3551 after 17.553854
2014-04-01 11:44:45.692237 7ffec89c6700  1 mds.0.10 reconnect_done
2014-04-01 11:44:45.996384 7ffec89c6700  1 mds.0.10 handle_mds_map i am now 
mds.0.10
2014-04-01 11:44:45.996388 7ffec89c6700  1 mds.0.10 handle_mds_map state change 
up:reconnect --> up:rejoin
2014-04-01 11:44:45.996390 7ffec89c6700  1 mds.0.10 rejoin_start
2014-04-01 11:49:53.158471 7ffec89c6700  1 mds.0.10 rejoin_joint_start

then lots (4667 lines) of
2014-04-01 11:50:10.237035 7ffebc844700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.180:6789/0 pipe(0x38a7da00 sd=104 :41115 s=4 pgs=0 cs=0 l=1 
c=0x6513e8840).connect got RESETSESSION but no longer connecting

with one intermediate
2014-04-01 11:51:50.181354 7ffebcf4b700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.180:6789/0 pipe(0x10e282580 sd=103 :0 s=1 pgs=0 cs=0 l=1 c=0xc77d5ee0).fault


then sudden change
2014-04-01 11:57:30.440554 7ffebcd49700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.182:6789/0 pipe(0xa1534100 sd=104 :48176 s=4 pgs=0 cs=0 l=1 
c=0xd99b11e0).connect got RESETSESSION but no longer connecting
2014-04-01 11:57:30.722607 7ffebec68700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722669 7ffebec68700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=104 :48235 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722885 7ffebec68700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER
2014-04-01 11:57:30.722945 7ffebec68700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.181:6789/0 pipe(0x1ce98a00 sd=57 :48237 s=1 pgs=0 cs=0 l=1 
c=0xc48a3f40).connect got BADAUTHORIZER

followed by lots of
2014-04-01 11:57:30.738562 7ffebec68700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.181:6789/0 pipe(0xead9fa80 sd=57 :0 s=1 pgs=0 cs=0 l=1 c=0x10e5d280).fault

with sporadic
2014-04-01 11:57:32.431219 7ffebeb67700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.182:6789/0 pipe(0xef85cd80 sd=103 :41218 s=4 pgs=0 cs=0 l=1 
c=0x130590dc0).connect got RESETSESSION but no longer connecting


until the dmup
2014-04-01 11:59:27.612850 7ffebea66700  0 -- 10.141.8.180:6837/55117 >> 
10.141.8.181:6789/0 pipe(0xe3036400 sd=103 :0 s=1 pgs=0 cs=0 l=1 c=0xa7be300).fault
2014-04-01 11:59:27.639009 7ffec89c6700 -1 mds/CDir.cc: In function 'void 
CDir::_omap_fetched(ceph::bufferlist&, std::map<std::basic_string<char, 
std::char_traits<char>, std::allocator\
<char> >, ceph::buffer::list, std::less<std::basic_string<char, std::char_traits<char>, 
std::allocator<char> > >, std::allocator<std::pair<const std::basic_string<char, std::char_trait\
s<char>, std::allocator<char> >, ceph::buffer::list> > >&, const std::string&, 
int)' thread 7ffec89c6700 time 2014-04-01 11:59:27.620684
mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)

 ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
 1: (CDir::_omap_fetched(ceph::buffer::list&, std::map<std::string, ceph::buffer::list, 
std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > 
>&, st\
d::string const&, int)+0x4d71) [0x77c3c1]
 2: (Context::complete(int)+0x9) [0x56bb79]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x11a6) [0x806dd6]
 4: (MDS::handle_core_message(Message*)+0x9c7) [0x5901d7]
 5: (MDS::_dispatch(Message*)+0x2f) [0x59028f]
 6: (MDS::ms_dispatch(Message*)+0x1ab) [0x591d4b]
 7: (DispatchQueue::entry()+0x582) [0x902072]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x85ef4d]
 9: /lib64/libpthread.so.0() [0x34c36079d1]
 10: (clone()+0x6d) [0x34c32e8b6d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

--- begin dump of recent events ---
-10000> 2014-04-01 11:59:27.137779 7ffec89c6700  5 mds.0.10 initiating monitor 
reconnect; maybe we're not the slow one
 -9999> 2014-04-01 11:59:27.137787 7ffec89c6700 10 monclient(hunting): 
_reopen_session rank -1 name
 -9998> 2014-04-01 11:59:27.137790 7ffec89c6700  1 -- 10.141.8.180:6837/55117 
mark_down 0x1326be460 -- 0x2cae5f00
 -9997> 2014-04-01 11:59:27.137807 7ffebf672700  2 -- 10.141.8.180:6837/55117 
>> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57 :0 s=4 pgs=0 cs=0 l=1 
c=0x1326be460).connect error 10.141.8.1\
81:6789/0, 104: Connection reset by peer
 -9996> 2014-04-01 11:59:27.137819 7ffebf672700  3 -- 10.141.8.180:6837/55117 
>> 10.141.8.181:6789/0 pipe(0x2cae5f00 sd=57 :0 s=4 pgs=0 cs=0 l=1 
c=0x1326be460).connect fault, but state\
 = closed != connecting, stopping
 -9995> 2014-04-01 11:59:27.137824 7ffec89c6700 10 monclient(hunting): picked 
mon.ceph003 con 0x3af54ba0 addr 10.141.8.182:6789/0
 -9994> 2014-04-01 11:59:27.137832 7ffec89c6700 10 monclient(hunting): 
_send_mon_message to mon.ceph003 at 10.141.8.182:6789/0





-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Apr 2, 2014 at 3:39 AM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

----- Message from Gregory Farnum <g...@inktank.com> ---------
    Date: Tue, 1 Apr 2014 09:03:17 -0700
    From: Gregory Farnum <g...@inktank.com>

Subject: Re: [ceph-users] ceph 0.78 mon and mds crashing (bus error)
      To: "Yan, Zheng" <uker...@gmail.com>
      Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>, ceph-users
<ceph-users@lists.ceph.com>



On Tue, Apr 1, 2014 at 7:12 AM, Yan, Zheng <uker...@gmail.com> wrote:

On Tue, Apr 1, 2014 at 10:02 PM, Kenneth Waegeman
<kenneth.waege...@ugent.be> wrote:

After some more searching, I've found that the source of the problem is
with
the mds and not the mon.. The mds crashes, generates a core dump that
eats
the local space, and in turn the monitor (because of leveldb) crashes.

The error in the mds log of one host:

2014-04-01 15:46:34.414615 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
10.141.8.180:6789/0 pipe(0x517371180 sd=54 :42439 s=4 pgs=0 cs=0 l=1
c=0x147ac780).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.438792 7f871194f700  0 -- 10.141.8.180:6836/13152 >>
10.141.8.180:6789/0 pipe(0x1b099f580 sd=8 :43150 s=4 pgs=0 cs=0 l=1
c=0x1fd44360).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.439028 7f870e319700  0 -- 10.141.8.180:6836/13152 >>
10.141.8.182:6789/0 pipe(0x13aa64880 sd=54 :37085 s=4 pgs=0 cs=0 l=1
c=0x1fd43de0).connect got RESETSESSION but no longer connecting
2014-04-01 15:46:34.468257 7f871b7ae700 -1 mds/CDir.cc: In function
'void
CDir::_omap_fetched(ceph::bufferlist&, std::map<std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, ceph::buffer::list,
std::less<std::basic_string<char, std::char_traits<char>,
std::allocator<char> > >, std::allocator<std::pair<const
std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
ceph::buffer::list> > >&, const std::string&, int)' thread 7f871b7ae700
time
2014-04-01 15:46:34.448320
mds/CDir.cc: 1474: FAILED assert(r == 0 || r == -2 || r == -61)


could you use gdb to check what is value of variable 'r' .


If you look at the crash dump log you can see the return value in the
osd_op_reply message:
-1> 2014-04-01 15:46:34.440860 7f871b7ae700  1 --
10.141.8.180:6836/13152 <== osd.3 10.141.8.180:6827/4366 33077 ====
osd_op_reply(4179177 100001f2ef1.00000000 [omap-get-header
0~0,omap-get-vals 0~16] v0'0 uv0 ack = -108 (Cannot send after
transport endpoint shutdown)) v6 ==== 229+0+0 (958358678 0 0)
0x2cff7aa80 con 0x37ea3c0

-108, which is ESHUTDOWN, but we also use it (via the 108 constant, I
think because ESHUTDOWN varies across platforms) as EBLACKLISTED.
So it looks like this is itself actually a symptom of another problem
that is causing the MDS to get timed out on the monitor. If a core
dump is "eating the local space", maybe the MDS is stuck in an
infinite allocation loop of some kind? How big are your disks,
Kenneth? Do you have any information on how much CPU/memory the MDS
was using before this?


I monitored the mds process after restart:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19215 root      20   0 6070m 5.7g 5236 S 778.6 18.1   1:27.54 ceph-mds
19215 root      20   0 7926m 7.5g 5236 S 179.2 23.8   2:44.39 ceph-mds
19215 root      20   0 12.4g  12g 5236 S 157.2 38.8   3:43.47 ceph-mds
19215 root      20   0 16.6g  16g 5236 S 144.4 52.0   4:15.01 ceph-mds
19215 root      20   0 19.9g  19g 5236 S 137.2 62.5   4:35.83 ceph-mds
19215 root      20   0 24.5g  24g 5224 S 136.5 77.0   5:04.66 ceph-mds
19215 root      20   0 25.8g  25g 2944 S 33.7 81.2   5:13.74 ceph-mds
19215 root      20   0 26.0g  25g 2916 S 24.6 81.7   5:19.07 ceph-mds
19215 root      20   0 26.1g  25g 2916 S 13.0 82.1   5:22.16 ceph-mds
19215 root      20   0 27.7g  26g 1856 S 100.0 85.8   5:36.46 ceph-mds

Then it crashes. I changed the core dump location out of the root fs, the
core dump is indeed about 26G

My disks:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       9.9G  2.9G  6.5G  31% /
tmpfs            16G     0   16G   0% /dev/shm
/dev/sda1       248M   53M  183M  23% /boot
/dev/sda4       172G   61G  112G  35% /var/lib/ceph/log/sda4
/dev/sdb        187G   61G  127G  33% /var/lib/ceph/log/sdb
/dev/sdc        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdc
/dev/sdd        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdd
/dev/sde        3.7T  1.4T  2.4T  37% /var/lib/ceph/osd/sde
/dev/sdf        3.7T  1.5T  2.3T  39% /var/lib/ceph/osd/sdf
/dev/sdg        3.7T  2.1T  1.7T  56% /var/lib/ceph/osd/sdg
/dev/sdh        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdh
/dev/sdi        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdi
/dev/sdj        3.7T  1.7T  2.0T  47% /var/lib/ceph/osd/sdj
/dev/sdk        3.7T  2.1T  1.6T  58% /var/lib/ceph/osd/sdk
/dev/sdl        3.7T  1.7T  2.0T  46% /var/lib/ceph/osd/sdl
/dev/sdm        3.7T  1.5T  2.2T  41% /var/lib/ceph/osd/sdm
/dev/sdn        3.7T  1.4T  2.3T  38% /var/lib/ceph/osd/sdn




-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com



----- End message from Gregory Farnum <g...@inktank.com> -----



--

Met vriendelijke groeten,
Kenneth Waegeman

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to