do cat /proc/<pid>/limits probably you hit max processes limit or max FD limit
> Hi Ceph-Users,
>
> I have absolutely no idea what is going on on my systems...
>
> Hardware:
> 45 x 4TB Harddisks
> 2 x 6 Core CPUs
> 256GB Memory
>
> When initializing all disks and join them to the cluster, after
> approximately 30 OSDs, other osds are crashing. When I try to start them
> again I see different kinds of errors. For example:
>
>
> Starting Ceph osd.316 on ceph-osd-bs04...already running
> === osd.317 ===
> Traceback (most recent call last):
> File "/usr/bin/ceph", line 830, in <module>
> sys.exit(main())
> File "/usr/bin/ceph", line 773, in main
> sigdict, inbuf, verbose)
> File "/usr/bin/ceph", line 420, in new_style_command
> inbuf=inbuf)
> File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line 1112,
> in json_command
> raise RuntimeError('"{0}": exception {1}'.format(cmd, e))
> NameError: global name 'cmd' is not defined
> Exception thread.error: error("can't start new thread",) in <bound
> method Rados.__del__ of <rados.Rados object
> at 0x29ee410>> ignored
>
>
> or:
> /etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork
> /etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork
> /etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork
>
> or:
> /usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location: Cannot fork
> /usr/bin/ceph-crush-location: 79: /usr/bin/ceph-crush-location: Cannot fork
> Thread::try_create(): pthread_create failed with error
> 11common/Thread.cc: In function 'void Thread::create(size_t)' thread
> 7fcf768c9760 time 2014-09-12 15:00:28.284735
> common/Thread.cc: 110: FAILED assert(ret == 0)
> ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
> 1: /usr/bin/ceph-conf() [0x51de8f]
> 2: (CephContext::CephContext(unsigned int)+0xb1) [0x520fe1]
> 3: (common_preinit(CephInitParameters const&, code_environment_t,
> int)+0x48) [0x52eb78]
> 4: (global_pre_init(std::vector<char const*, std::allocator<char
> const*> >*, std::vector<char const*, std::allocator<char const*> >&,
> unsigned int, code_environment_t, int)+0x8d) [0x518d0d]
> 5: (main()+0x17a) [0x514f6a]
> 6: (__libc_start_main()+0xfd) [0x7fcf7522ceed]
> 7: /usr/bin/ceph-conf() [0x5168d1]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> Aborted (core dumped)
> /etc/init.d/ceph: 340: /etc/init.d/ceph: Cannot fork
> /etc/init.d/ceph: 1: /etc/init.d/ceph: Cannot fork
> Traceback (most recent call last):
> File "/usr/bin/ceph", line 830, in <module>
> sys.exit(main())
> File "/usr/bin/ceph", line 590, in main
> conffile=conffile)
> File "/usr/lib/python2.7/dist-packages/rados.py", line 198, in __init__
> librados_path = find_library('rados')
> File "/usr/lib/python2.7/ctypes/util.py", line 224, in find_library
> return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
> File "/usr/lib/python2.7/ctypes/util.py", line 213, in
> _findSoname_ldconfig
> f = os.popen('/sbin/ldconfig -p 2>/dev/null')
> OSError: [Errno 12] Cannot allocate memory
>
> But anyways, when I look at the memory consumption of the system:
> # free -m
> total used free shared buffers cached
> Mem: 258450 25841 232609 0 18 15506
> -/+ buffers/cache: 10315 248135
> Swap: 3811 0 3811
>
>
> There are more then 230GB of memory available! What is going on there?
> System:
> Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1
> (2014-07-13) x86_64 GNU/Linux
>
> Since this is happening on other Hardware as well, I don't think it's
> Hardware related. I have no Idea if this is an OS issue (which would be
> seriously strange) or a ceph issue.
>
> Since this is happening only AFTER we upgraded to firefly, I guess it
> has something to do with ceph.
>
> ANY idea on what is going on here would be very appreciated!
>
> Regards,
> Christian
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. WoĊoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: [email protected]
<mailto:[email protected]>
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
