Dumpling

ceph-0.67.9-16.g69a99e6

I guess it shouldn’t be logging it at all?


Thanks
Jan


> On 02 Jun 2015, at 20:42, Somnath Roy <somnath....@sandisk.com> wrote:
> 
> Which code base are you using ?
> 
> -----Original Message-----
> From: Jan Schermer [mailto:j...@schermer.cz] 
> Sent: Tuesday, June 02, 2015 11:41 AM
> To: Somnath Roy
> Cc: ceph-users
> Subject: Re: [ceph-users] ceph-mon logging like crazy because....?
> 
> We actually have
> "debug mon = 0”
> 
> It was always spammy, but this is too spammy - on one mon the log size is 
> 500MB since morning. on other node it’s 17GB and about 16.5GB of that is 
> within one hour - something’s not right there and this is likely just a 
> symptom…
> 
> Jan
> 
> 
> On 02 Jun 2015, at 20:36, Somnath Roy <somnath....@sandisk.com> wrote:
>> 
>> I think with the latest version of code it is printing only for log level 5, 
>> earlier it was 1. Here is the link where I had some conversation about this 
>> earlier with Sage.
>> 
>> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/20881
>> 
>> So, IMO nothing to worry about other than log spam here which is fixed 
>> in the latest build or you can fix it with debug mon = 0/0
>> 
>> Thanks & Regards
>> Somnath
>> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
>> Of Jan Schermer
>> Sent: Tuesday, June 02, 2015 11:33 AM
>> To: ceph-users
>> Subject: Re: [ceph-users] ceph-mon logging like crazy because....?
>> 
>> Another follow-up.
>> The whole madness started with “mon compact” which we run from cron (else 
>> leveldb eats all space). It’s been running for about 14 days now with no 
>> incident.
>> 
>> 2015-06-02 16:40:01.624804 7f4309d45700  0 mon.node-14@2(peon) e3 
>> handle_command mon_command({"prefix": "compact"} v 0) v1
>> 2015-06-02 16:40:23.646514 7f430a746700  1 
>> mon.node-14@2(peon).paxos(paxos updating c 36596805..36597321) 
>> lease_timeout -- calling new election
>> 2015-06-02 16:40:23.646947 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646947 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646953 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646954 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646960 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646961 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646963 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646964 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646968 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646969 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646971 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646972 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646976 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646977 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 2015-06-02 16:40:23.646979 7f4309d45700  1 
>> mon.node-14@2(probing).paxos(paxos recovering c 36596805..36597321) 
>> is_readable now=2015-06-02 16:40:23.646980 lease_expire=0.000000 has 
>> v0 lc 3659
>> 7321
>> 
>> 
>> The sequence that follows is
>> probing recovering
>> electing recovering
>> peon recovering
>> peon active (and this is the madness)
>> 
>> It logs much less now, but the issue is still here…
>> 
>> Jan
>> 
>>> On 02 Jun 2015, at 20:17, Jan Schermer <j...@schermer.cz> wrote:
>>> 
>>> Actually looks like it stopped, but here’s a more representative 
>>> sample (notice how often it logged this!)
>>> 
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.865833 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.865834 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.865860 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.865861 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.865886 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.865887 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.865944 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.865946 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.865989 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.865992 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866025 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866027 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866072 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866074 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866121 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866123 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866164 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866166 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866205 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866207 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866244 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866246 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866285 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866287 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 2015-06-02 17:39:59.866325 7f4309d45700  1 
>>> mon.node-14@2(peon).paxos(paxos active c 36601574..36602135) 
>>> is_readable now=2015-06-02 17:39:59.866327 lease_expire=2015-06-02
>>> 17:40:04.221316 has
>>> v0 lc 36602135
>>> 
>>> 
>>> 
>>>> On 02 Jun 2015, at 20:14, Jan Schermer <j...@schermer.cz> wrote:
>>>> 
>>>> Our mons just went into a logging frenzy.
>>>> 
>>>> We have 3 mons in the cluster, and they mostly log stuff like this
>>>> 
>>>> 2015-06-02 18:00:48.749386 7f1c08c0d700  1 
>>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) 
>>>> is_readable now=2015-06-02 18:00:48.749389 lease_expire=2015-06-02
>>>> 18:00:53.507837 has v0 lc 36604063
>>>> 2015-06-02 18:00:49.025179 7f1c08c0d700  1 
>>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) 
>>>> is_readable now=2015-06-02 18:00:49.025187 lease_expire=2015-06-02
>>>> 18:00:53.507837 has v0 lc 36604063
>>>> 2015-06-02 18:00:49.025640 7f1c08c0d700  1 
>>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) 
>>>> is_readable now=2015-06-02 18:00:49.025642 lease_expire=2015-06-02
>>>> 18:00:53.507837 has v0 lc 36604063
>>>> 2015-06-02 18:00:49.026132 7f1c08c0d700  1 
>>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) 
>>>> is_readable now=2015-06-02 18:00:49.026134 lease_expire=2015-06-02
>>>> 18:00:53.507837 has v0 lc 36604063
>>>> 2015-06-02 18:00:49.028388 7f1c08c0d700  1 
>>>> mon.node-10@1(peon).paxos(paxos active c 36603331..36604063) 
>>>> is_readable now=2015-06-02 18:00:49.028393 lease_expire=2015-06-02
>>>> 18:00:53.507837 has v0 lc 36604063
>>>> 
>>>> 
>>>> There are few lines every second, sometimes more, sometimes less 
>>>> (tell me if that’s normal. I’m not sure)
>>>> 
>>>> Two of them went completely haywire, one log is 17GB now and rising. It’s 
>>>> still mostly the same content, just more frequent:
>>>> 
>>>> 2015-06-02 18:09:00.879950 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) 
>>>> is_readable now=2015-06-02 18:09:00.879956 lease_expire=0.000000 has
>>>> v0 lc 36604772
>>>> 2015-06-02 18:09:00.879968 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) 
>>>> is_readable now=2015-06-02 18:09:00.879969 lease_expire=0.000000 has
>>>> v0 lc 36604772
>>>> 2015-06-02 18:09:00.954835 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) 
>>>> is_readable now=2015-06-02 18:09:00.954843 lease_expire=0.000000 has
>>>> v0 lc 36604772
>>>> 2015-06-02 18:09:00.954860 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos updating c 36604084..36604772) 
>>>> is_readable now=2015-06-02 18:09:00.954861 lease_expire=0.000000 has
>>>> v0 lc 36604772
>>>> 2015-06-02 18:09:01.249648 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) 
>>>> is_readable now=2015-06-02 18:09:01.249668 lease_expire=2015-06-02
>>>> 18:09:06.091738 has v0 lc 36604773
>>>> 2015-06-02 18:09:01.249697 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) 
>>>> is_readable now=2015-06-02 18:09:01.249699 lease_expire=2015-06-02
>>>> 18:09:06.091738 has v0 lc 36604773
>>>> 2015-06-02 18:09:01.249708 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) 
>>>> is_readable now=2015-06-02 18:09:01.249709 lease_expire=2015-06-02
>>>> 18:09:06.091738 has v0 lc 36604773
>>>> 2015-06-02 18:09:01.249736 7f4309d45700  1 
>>>> mon.node-14@2(peon).paxos(paxos active c 36604084..36604773) 
>>>> is_readable now=2015-06-02 18:09:01.249736 lease_expire=2015-06-02
>>>> 18:09:06.091738 has v0 lc 36604773
>>>> 
>>>> 
>>>> Any idea what it might be? Clocks look synced, no other aparent problem 
>>>> that I can see, the cluster is working.
>>>> I’d like to know why this happened before I restart the unhealthy mons 
>>>> which (I hope) will fix this.
>>>> 
>>>> Thanks
>>>> 
>>>> Jan
>>>> 
>>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ________________________________
>> 
>> PLEASE NOTE: The information contained in this electronic mail message is 
>> intended only for the use of the designated recipient(s) named above. If the 
>> reader of this message is not the intended recipient, you are hereby 
>> notified that you have received this message in error and that any review, 
>> dissemination, distribution, or copying of this message is strictly 
>> prohibited. If you have received this communication in error, please notify 
>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>> any and all copies of this message in your possession (whether hard copies 
>> or electronically stored copies).
>> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to