On Mar 7, 2013, at 9:24 AM, Greg Farnum <[email protected]> wrote: > This isn't bringing up anything in my brain, but I don't know what that > _sample() function is actually doing — did you get any farther into it?
_sample reads /proc/self/maps in a loop until eof or some other conditions. i couldn't figure out if the thread was stuck in _sample or a level up. Anyhow, my gdb-foo isn't stellar and I managed to crash the mds. I'm gonna stick some log points in and try to reproduce it. > -Greg > > On Wednesday, March 6, 2013 at 6:23 PM, Noah Watkins wrote: > >> Which, looks to be in a tight loop in the memory model _sample… >> >> (gdb) bt >> #0 0x00007f0270d84d2d in read () from /lib/x86_64-linux-gnu/libpthread.so.0 >> #1 0x00007f027046dd88 in std::__basic_file<char>::xsgetn(char*, long) () >> from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #2 0x00007f027046f4c5 in std::basic_filebuf<char, std::char_traits<char> >> >::underflow() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #3 0x00007f0270467ceb in std::basic_istream<char, std::char_traits<char> >& >> std::getline<char, std::char_traits<char>, std::allocator<char> >> >(std::basic_istream<char, std::char_traits<char> >&, >> std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, >> char) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> #4 0x000000000072bdd4 in MemoryModel::_sample(MemoryModel::snap*) () >> #5 0x00000000005658db in MDCache::check_memory_usage() () >> #6 0x00000000004ba929 in MDS::tick() () >> #7 0x0000000000794c65 in SafeTimer::timer_thread() () >> #8 0x00000000007958ad in SafeTimerThread::entry() () >> #9 0x00007f0270d7de9a in start_thread () from >> /lib/x86_64-linux-gnu/libpthread.so.0 >> >> On Mar 6, 2013, at 6:18 PM, Noah Watkins <[email protected] >> (mailto:[email protected])> wrote: >> >>> >>> On Mar 6, 2013, at 5:57 PM, Noah Watkins <[email protected] >>> (mailto:[email protected])> wrote: >>> >>>> The MDS process in my cluster is running at 100% CPU. In fact I thought >>>> the cluster came down, but rather an ls was taking a minute. There aren't >>>> any clients active. I've left the process running in case there is any >>>> probing you'd like to do on it: >>>> >>>> virt res cpu >>>> 4629m 88m 5260 S 92 1.1 113:32.79 ceph-mds >>>> >>>> Thanks, >>>> Noah >>> >>> >>> >>> >>> This is a ceph-mds child thread under strace. The only thread >>> that appears to be doing anything. >>> >>> root@issdm-44:/home/hadoop/hadoop-common# strace -p 3372 >>> Process 3372 attached - interrupt to quit >>> read(1649, "7f0203235000-7f0203236000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0205053000-7f0205054000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0206e71000-7f0206e72000 ---p 0"..., 8191) = 4050 >>> read(1649, "7f0214144000-7f0214244000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0215f62000-7f0216062000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0217d80000-7f0217e80000 rw-p 0"..., 8191) = 4020 >>> read(1649, "7f0219b9e000-7f0219c9e000 rw-p 0"..., 8191) = 4020 >>> ... >>> >>> That file looks to be: >>> >>> ceph-mds 3337 root 1649r REG 0,3 0 266903 /proc/3337/maps >>> >>> (3337 is the parent process). >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to [email protected] >> (mailto:[email protected]) >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
