Dumpling ceph-0.67.9-16.g69a99e6
I guess it shouldn’t be logging it at all? Thanks Jan > On 02 Jun 2015, at 20:42, Somnath Roy <somnath....@sandisk.com> wrote: > > Which code base are you using ? > > -----Original Message----- > From: Jan Schermer [mailto:j...@schermer.cz] > Sent: Tuesday, June 02, 2015 11:41 AM > To: Somnath Roy > Cc: ceph-users > Subject: Re: [ceph-users] ceph-mon logging like crazy because....? > > We actually have > "debug mon = 0” > > It was always spammy, but this is too spammy - on one mon the log size is > 500MB since morning. on other node it’s 17GB and about 16.5GB of that is > within one hour - something’s not right there and this is likely just a > symptom… > > Jan > > > On 02 Jun 2015, at 20:36, Somnath Roy <somnath....@sandisk.com> wrote: >> >> I think with the latest version of code it is printing only for log level 5, >> earlier it was 1. Here is the link where I had some conversation about this >> earlier with Sage. >> >> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881 >> >> So, IMO nothing to worry about other than log spam here which is fixed >> in the latest build or you can fix it with debug mon = 0/0 >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf >> Of Jan Schermer >> Sent: Tuesday, June 02, 2015 11:33 AM >> To: ceph-users >> Subject: Re: [ceph-users] ceph-mon logging like crazy because....? >> >> Another follow-up. >> The whole madness started with “mon compact” which we run from cron (else >> leveldb eats all space). It’s been running for about 14 days now with no >> incident. >> >> 2015-06-02 16:40:01.624804 7f4309d45700 0 mon.node-14@2(peon) e3 >> handle_command mon_command({"prefix": "compact"} v 0) v1 >> 2015-06-02 16:40:23.646514 7f430a746700 1 >> mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) >> lease_timeout -- calling new election >> 2015-06-02 16:40:23.646947 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> 2015-06-02 16:40:23.646953 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> 2015-06-02 16:40:23.646960 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> 2015-06-02 16:40:23.646963 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> 2015-06-02 16:40:23.646968 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> 2015-06-02 16:40:23.646971 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> 2015-06-02 16:40:23.646976 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> 2015-06-02 16:40:23.646979 7f4309d45700 1 >> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) >> is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.000000 has >> v0 lc 3659 >> 7321 >> >> >> The sequence that follows is >> probing recovering >> electing recovering >> peon recovering >> peon active (and this is the madness) >> >> It logs much less now, but the issue is still here… >> >> Jan >> >>> On 02 Jun 2015, at 20:17, Jan Schermer <j...@schermer.cz> wrote: >>> >>> Actually looks like it stopped, but here’s a more representative >>> sample (notice how often it logged this!) >>> >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.865833 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.865860 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.865886 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.865944 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.865989 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866025 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866072 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866121 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866164 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866205 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866207 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866244 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866246 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866285 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866287 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> 2015-06-02 17:39:59.866325 7f4309d45700 1 >>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) >>> is_readable now=2015-06-02 17:39:59.866327 lease_expire=2015-06-02 >>> 17:40:04.221316 has >>> v0 lc 36602135 >>> >>> >>> >>>> On 02 Jun 2015, at 20:14, Jan Schermer <j...@schermer.cz> wrote: >>>> >>>> Our mons just went into a logging frenzy. >>>> >>>> We have 3 mons in the cluster, and they mostly log stuff like this >>>> >>>> 2015-06-02 18:00:48.749386 7f1c08c0d700 1 >>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>>> is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02 >>>> 18:00:53.507837 has v0 lc 36604063 >>>> 2015-06-02 18:00:49.025179 7f1c08c0d700 1 >>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>>> is_readable now=2015-06-02 18:00:49.025187 lease_expire=2015-06-02 >>>> 18:00:53.507837 has v0 lc 36604063 >>>> 2015-06-02 18:00:49.025640 7f1c08c0d700 1 >>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>>> is_readable now=2015-06-02 18:00:49.025642 lease_expire=2015-06-02 >>>> 18:00:53.507837 has v0 lc 36604063 >>>> 2015-06-02 18:00:49.026132 7f1c08c0d700 1 >>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>>> is_readable now=2015-06-02 18:00:49.026134 lease_expire=2015-06-02 >>>> 18:00:53.507837 has v0 lc 36604063 >>>> 2015-06-02 18:00:49.028388 7f1c08c0d700 1 >>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) >>>> is_readable now=2015-06-02 18:00:49.028393 lease_expire=2015-06-02 >>>> 18:00:53.507837 has v0 lc 36604063 >>>> >>>> >>>> There are few lines every second, sometimes more, sometimes less >>>> (tell me if that’s normal. I’m not sure) >>>> >>>> Two of them went completely haywire, one log is 17GB now and rising. It’s >>>> still mostly the same content, just more frequent: >>>> >>>> 2015-06-02 18:09:00.879950 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>>> is_readable now=2015-06-02 18:09:00.879956 lease_expire=0.000000 has >>>> v0 lc 36604772 >>>> 2015-06-02 18:09:00.879968 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>>> is_readable now=2015-06-02 18:09:00.879969 lease_expire=0.000000 has >>>> v0 lc 36604772 >>>> 2015-06-02 18:09:00.954835 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>>> is_readable now=2015-06-02 18:09:00.954843 lease_expire=0.000000 has >>>> v0 lc 36604772 >>>> 2015-06-02 18:09:00.954860 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) >>>> is_readable now=2015-06-02 18:09:00.954861 lease_expire=0.000000 has >>>> v0 lc 36604772 >>>> 2015-06-02 18:09:01.249648 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>>> is_readable now=2015-06-02 18:09:01.249668 lease_expire=2015-06-02 >>>> 18:09:06.091738 has v0 lc 36604773 >>>> 2015-06-02 18:09:01.249697 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>>> is_readable now=2015-06-02 18:09:01.249699 lease_expire=2015-06-02 >>>> 18:09:06.091738 has v0 lc 36604773 >>>> 2015-06-02 18:09:01.249708 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>>> is_readable now=2015-06-02 18:09:01.249709 lease_expire=2015-06-02 >>>> 18:09:06.091738 has v0 lc 36604773 >>>> 2015-06-02 18:09:01.249736 7f4309d45700 1 >>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) >>>> is_readable now=2015-06-02 18:09:01.249736 lease_expire=2015-06-02 >>>> 18:09:06.091738 has v0 lc 36604773 >>>> >>>> >>>> Any idea what it might be? Clocks look synced, no other aparent problem >>>> that I can see, the cluster is working. >>>> I’d like to know why this happened before I restart the unhealthy mons >>>> which (I hope) will fix this. >>>> >>>> Thanks >>>> >>>> Jan >>>> >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message is >> intended only for the use of the designated recipient(s) named above. If the >> reader of this message is not the intended recipient, you are hereby >> notified that you have received this message in error and that any review, >> dissemination, distribution, or copying of this message is strictly >> prohibited. If you have received this communication in error, please notify >> the sender by telephone or e-mail (as shown above) immediately and destroy >> any and all copies of this message in your possession (whether hard copies >> or electronically stored copies). >> > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com