Hi Karan, We faced same issue and resolved after increasing the open file limit and maximum no of threads
Config reference /etc/security/limit.conf root hard nofile 65535 sysctl -w kernel.pid_max=4194303 http://tracker.ceph.com/issues/10554#change-47024 Cheers Mohamed Pakkeer On Mon, Mar 9, 2015 at 4:20 PM, Azad Aliyar <[email protected]> wrote: > *Check Max Threadcount:* If you have a node with a lot of OSDs, you may > be hitting the default maximum number of threads (e.g., usually 32k), > especially during recovery. You can increase the number of threads using > sysctl to see if increasing the maximum number of threads to the maximum > possible number of threads allowed (i.e., 4194303) will help. For example: > > sysctl -w kernel.pid_max=4194303 > > If increasing the maximum thread count resolves the issue, you can make > it permanent by including a kernel.pid_max setting in the /etc/sysctl.conf > file. For example: > > kernel.pid_max = 4194303 > > > On Mon, Mar 9, 2015 at 4:11 PM, Karan Singh <[email protected]> wrote: > >> Hello Community need help to fix a long going Ceph problem. >> >> Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to >> restart OSD’s i am getting this error >> >> >> *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc >> <http://Thread.cc>: In function 'void Thread::create(size_t)' thread >> 7f760dac9700 time 2015-03-09 12:22:16.311970* >> *common/Thread.cc <http://Thread.cc>: 129: FAILED assert(ret == 0)* >> >> >> *Environment *: 4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 , >> 3.17.2-1.el6.elrepo.x86_64 >> >> Tried upgrading from 0.80.7 to 0.80.8 but no Luck >> >> Tried centOS stock kernel 2.6.32 but no Luck >> >> Memory is not a problem more then 150+GB is free >> >> >> Did any one every faced this problem ?? >> >> *Cluster status * >> >> * cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33* >> * health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs >> incomplete; 1735 pgs peering; 8938 pgs stale; 1* >> *736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; >> recovery 6061/31080 objects degraded (19* >> *.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, >> mon.pouta-s03* >> * monmap e3: 3 mons at >> {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789* >> */0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03* >> * osdmap e26633: 239 osds: 85 up, 196 in* >> * pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects* >> * 4699 GB used, 707 TB / 711 TB avail* >> * 6061/31080 objects degraded (19.501%)* >> * 14 down+remapped+peering* >> * 39 active* >> * 3289 active+clean* >> * 547 peering* >> * 663 stale+down+peering* >> * 705 stale+active+remapped* >> * 1 active+degraded+remapped* >> * 1 stale+down+incomplete* >> * 484 down+peering* >> * 455 active+remapped* >> * 3696 stale+active+degraded* >> * 4 remapped+peering* >> * 23 stale+down+remapped+peering* >> * 51 stale+active* >> * 3637 active+degraded* >> * 3799 stale+active+clean* >> >> *OSD : Logs * >> >> *2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc >> <http://Thread.cc>: In function 'void Thread::create(size_t)' thread >> 7f760dac9700 time 2015-03-09 12:22:16.311970* >> *common/Thread.cc <http://Thread.cc>: 129: FAILED assert(ret == 0)* >> >> * ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)* >> * 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]* >> * 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]* >> * 3: (Accepter::entry()+0x265) [0xb5c635]* >> * 4: /lib64/libpthread.so.0() [0x3c8a6079d1]* >> * 5: (clone()+0x6d) [0x3c8a2e89dd]* >> * NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this.* >> >> >> *More information at Ceph Tracker Issue : * >> http://tracker.ceph.com/issues/10988#change-49018 >> >> >> **************************************************************** >> Karan Singh >> Systems Specialist , Storage Platforms >> CSC - IT Center for Science, >> Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland >> mobile: +358 503 812758 >> tel. +358 9 4572001 >> fax +358 9 4572302 >> http://www.csc.fi/ >> **************************************************************** >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > Warm Regards, Azad Aliyar > Linux Server Engineer > *Email* : [email protected] *|* *Skype* : spark.azad > <http://www.sparksupport.com> <http://www.sparkmycloud.com> > <https://www.facebook.com/sparksupport> > <http://www.linkedin.com/company/244846> > <https://twitter.com/sparksupport> 3rd Floor, Leela Infopark, Phase > -2,Kakanad, Kochi-30, Kerala, India *Phone*:+91 484 6561696 , > *Mobile*:91-8129270421. > *Confidentiality Notice:* Information in this e-mail is proprietary to > SparkSupport. and is intended for use only by the addressed, and may > contain information that is privileged, confidential or exempt from > disclosure. If you are not the intended recipient, you are notified that > any use of this information in any manner is strictly prohibited. Please > delete this mail & notify us immediately at [email protected] > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Thanks & Regards K.Mohamed Pakkeer Mobile- 0091-8754410114
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
