Are there any additional messages prior to this, indicating what might have caused it?
Marcin On 04.12.2017 09:32, Chaigneau, Nicolas wrote: > Hello, > > > Got another occurence of the issue, on 2017 December 1st at 09:12. > > This time I had the server running with debug logs. > I noticed that, not only did the LFC stopped being executed, but also the > lease reclaiming mechanism (both at the same time). > > So I think that the issue is bigger, that all timers stop working. :( > > > egrep "DHCPSRV_MEMFILE_LFC_START|DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION" > kea1-open-20171202.log > > (...) > 2017-12-01 09:12:42.376 DEBUG [kea-dhcp4.dhcpsrv/32013] > DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: memfile-lfc > 2017-12-01 09:12:42.376 INFO [kea-dhcp4.dhcpsrv/32013] > DHCPSRV_MEMFILE_LFC_START starting Lease File Cleanup > 2017-12-01 09:12:43.240 DEBUG [kea-dhcp4.dhcpsrv/32013] > DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: > reclaim-expired-leases > (...) > 2017-12-01 09:13:40.255 DEBUG [kea-dhcp4.dhcpsrv/32013] > DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: > reclaim-expired-leases > 2017-12-01 09:13:41.256 DEBUG [kea-dhcp4.dhcpsrv/32013] > DHCPSRV_TIMERMGR_RUN_TIMER_OPERATION running operation for timer: > reclaim-expired-leases > > No more timer-related logs after this point. > (Note that the server is still processing DHCP packets correctly.) > > > > > Regards, > Nicolas. > >> >> OK, thanks for the update! >> >> For the record, the issue can appear whether Kea receives DHCP packets or >> not. >> So you can just start the server and wait :) >> >> >> >> >>> I've been running Kea with LFC for 2 days, simulating DHCP requests at >>> the rate of 1 lease/sec with valid lifetime of 60 seconds. I haven't >>> reproduced the problem so far. >>> >>> Marcin >>> >>> On 30.11.2017 09:28, Chaigneau, Nicolas wrote: >>>> >>>> >>>> >>>> I've restarted Kea a week ago with debug logs, in the hope of seeing >>>> something when the issue manifests. >>>> >>>> No "luck" so far. Still waiting... >>>> >>>> >>>> >>>> >>>>> The worker thread seems to be running. >>>>> The following command (on the Kea process not executing LFC) shows two >>>>> threads: >>>>> >>>>> ps -T -p <pid of Kea> >>>>> PID SPID TTY TIME CMD >>>>> 28268 28268 ? 00:00:03 kea-dhcp4 >>>>> 28268 28269 ? 00:00:03 kea-dhcp4 >>>>> >>>>> >>>>> The issue also happened on our lab, so I could try to upgrade to 1.3.0. >>>>> >>>>> But I would really prefer to find out the issue with 1.2.0 and fix this >>>>> version. >>>>> We've just deployed 1.2.0 on production, having to upgrade now would >>>>> be... difficult :/ >>>>> >>>>> >>>>> >>>>>> It would be useful to check if the worker thread triggering timers is >>>>>> still running or died. The kea-dhcp4 process should run main thread and >>>>>> the second thread for timers. You could use "ps" or "top" to verify this. >>>>>> >>>>>> In the Kea 1.3.0, we have removed the worker thread and made several >>>>>> significant updates to the timers scheduling. It would be useful it you >>>>>> could verify whether Kea 1.3.0 also has this issue, but if it is >>>>>> production environment, it is not good for experiments. >>>>>> >>>>>> Marcin >>>>>> >>>>>> On 20.11.2017 10:21, Chaigneau, Nicolas wrote: >>>>>>> The Kea server not executing LFC is still running, so if there's >>>>>>> something you want me to check... feel free to ask :) >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Message d'origine----- >>>>>>> De : Marcin Siodelski [mailto:[email protected]] >>>>>>> >>>>>>> Nicolas, >>>>>>> >>>>>>> Thanks for creating the ticket. We'll have a look and see if we can >>>>>>> reproduce it here. >>>>>>> >>>>>>> Marcin Siodelski >>>>>>> ISC >>>>>>> >>>>>>> On 20.11.2017 09:07, Chaigneau, Nicolas wrote: >>>>>>>> >>>>>>>> Observed a 4th occurrence of the issue. >>>>>>>> >>>>>>>> Server started on 2017, November 15th at 17:20:47 >>>>>>>> Last time LFC was executed: on November 19th at 04:50:47 >>>>>>>> >>>>>>>> I'll open a ticket. >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Nicolas. >>>>>>>> >>>>>>>>> >>>>>>>>> So this is confirmed: LFC is really not executed anymore at some >>>>>>>>> point, for a reason yet to be determined. >>>>>>>>> >>>>>>>>> I'll have a look at the changes between Kea 0.9.1 and 1.2.0. >>>>>>>>> >>>>>>>>> >>>>>>>>> If you think of anything that could help, I'm all ears :) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Nicolas. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hello Marcin, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for your answer. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I was too hasty to suspect the log rotation, this has nothing to do >>>>>>>>>> with it. Rotation occurs at night between 3 and 4am, and the last >>>>>>>>>> log I have was in the middle of the day. >>>>>>>>>> >>>>>>>>>> Furthermore, I think the issue manifesting after 24H is just a >>>>>>>>>> coincidence. Before that I've had Kea run for several days before >>>>>>>>>> seeing the issue. So it seems random... >>>>>>>>>> >>>>>>>>>> I don't think there's an issue with logging either, because when I >>>>>>>>>> stopped the server after the LFC issue appeared, then I got the >>>>>>>>>> normal shutdown logs. >>>>>>>>>> >>>>>>>>>> I think the LFC is not executed because of the two observations: >>>>>>>>>> - No log from Kea saying that LFC is launched >>>>>>>>>> - Lease files are not modified anymore (and their last modification >>>>>>>>>> date coincides with the last "DHCPSRV_MEMFILE_LFC_START" log from >>>>>>>>>> Kea) >>>>>>>>>> >>>>>>>>>> But you're right, I'm not 100% sure LFC is not executed... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So I've launched a little script that does loops over a "ps" to try >>>>>>>>>> and catch the LFC process running. >>>>>>>>>> It's not fully reliable because if LFC has nothing to do, it may >>>>>>>>>> start and end so fast that the ps won't see it. >>>>>>>>>> I tried that with LFC scheduled every 10 seconds. The script catches >>>>>>>>>> about 2 LFC executions out of 3. >>>>>>>>>> >>>>>>>>>> So on the live system on which the LFC appears to not run anymore, >>>>>>>>>> I'll let the script run for an hour. In this time frame LFC is >>>>>>>>>> normally executed 6 times. >>>>>>>>>> If I don't see anything, I'll conclude that LFC is indeed not >>>>>>>>>> executed. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Nicolas. >>>>>>>>>> >>>>>>>>>> -----Message d'origine----- >>>>>>>>>> De : Marcin Siodelski [mailto:[email protected]] >>>>>>>>>> >>>>>>>>>> Nicolas, >>>>>>>>>> >>>>>>>>>> I think we should start from checking if the issue doesn't stem from >>>>>>>>>> the >>>>>>>>>> log rotation. If I understand correctly, you're rotating the log file >>>>>>>>>> daily. Per your message below, things go weird after around 24 hours >>>>>>>>>> since server startup. Is it possible to verify that rotating every >>>>>>>>>> 12h >>>>>>>>>> would cause the issue to begin after 12h rather than 24h? >>>>>>>>>> >>>>>>>>>> I can't see anything in the Kea code that could prevent periodic >>>>>>>>>> execution of LFC, other than server reconfiguration. You're saying >>>>>>>>>> that >>>>>>>>>> LFC is not executed because the lease files aren't modified. But, is >>>>>>>>>> there any other log output present there? Perhaps the LFC is not >>>>>>>>>> executed because of some trouble with logging being a result of log >>>>>>>>>> file >>>>>>>>>> rotation? We should see if logging generally works fine after log >>>>>>>>>> file >>>>>>>>>> rotation in the first place. Could you enable more verbose logging to >>>>>>>>>> see if any log is produced when the LFC starting message is gone? >>>>>>>>>> >>>>>>>>>> Marcin Siodelski >>>>>>>>>> ISC >>>>>>>>>> >>>>>>>>>> On 17.11.2017 10:41, Chaigneau, Nicolas wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I've observed the issue again. >>>>>>>>>>> >>>>>>>>>>> With a Kea server that does not receive any DHCP packet: >>>>>>>>>>> At some point, Kea is still running but LFC is no longer executed. >>>>>>>>>>> >>>>>>>>>>> In Kea log file we don't have "DHCPSRV_MEMFILE_LFC_START" messages >>>>>>>>>>> anymore. >>>>>>>>>>> And the lease files are not modified on disk (they would be if LFC >>>>>>>>>>> was executed, so this is not just a logging issue). >>>>>>>>>>> >>>>>>>>>>> Kea was started on 2017, November 15th at 16:20:23. >>>>>>>>>>> LFC was first executed 10 minutes later, at 16:30:23. >>>>>>>>>>> And the last execution of LFC: November 16th, at 16:40:23. >>>>>>>>>>> Nothing after that... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Nicolas. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Wait... when I restarted the server, I got logs again... before >>>>>>>>>>>> the shutdown! >>>>>>>>>>>> So logging actually still worked. I just didn't have anything >>>>>>>>>>>> being logged. >>>>>>>>>>>> >>>>>>>>>>>> 2017-11-15 17:20:45.551 INFO [kea-dhcp4.dhcp4/29754] >>>>>>>>>>>> DHCP4_SHUTDOWN server shutdown >>>>>>>>>>>> 2017-11-15 17:20:47.603 INFO [kea-dhcp4.dhcp4/28141] >>>>>>>>>>>> DHCP4_STARTING Kea DHCPv4 server version 1.2.0 starting >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The only logs I normally have in Kea logs is LFC execution every >>>>>>>>>>>> 10 minutes: >>>>>>>>>>>> >>>>>>>>>>>> 2017-11-16 02:10:47.617 INFO [kea-dhcp4.dhcpsrv/28141] >>>>>>>>>>>> DHCPSRV_MEMFILE_LFC_START starting Lease File Cleanup >>>>>>>>>>>> 2017-11-16 02:10:47.617 INFO [kea-dhcp4.dhcpsrv/28141] >>>>>>>>>>>> DHCPSRV_MEMFILE_LFC_EXECUTE executing Lease File Cleanup (...) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I don't have these logs anymore after some time. >>>>>>>>>>>> So... either LFC is no longer executed after a while, or Kea stops >>>>>>>>>>>> writing these log messages ? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> A few more detail: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This worked fine before (with the same logrotate configuration), >>>>>>>>>>>>> with: >>>>>>>>>>>>> - Kea 0.9.1 >>>>>>>>>>>>> - RHEL 6.4 >>>>>>>>>>>>> >>>>>>>>>>>>> The issue now appears with: >>>>>>>>>>>>> - Kea 1.2.0 >>>>>>>>>>>>> - RHEL 7.1 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Should I open a ticket for this ? (I'm not sure the problem is >>>>>>>>>>>>> with Kea, maybe it's logrotate...) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've set up Kea log rotation with logrotate. >>>>>>>>>>>>>> This works for some time, and then it stops working: the new log >>>>>>>>>>>>>> file stays empty, until the server is restarted. >>>>>>>>>>>>>> I'm not sure what's going on... has anyone else had this issue ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm using the following options for logrotate: >>>>>>>>>>>>>> >>>>>>>>>>>>>> daily >>>>>>>>>>>>>> rotate 30 >>>>>>>>>>>>>> missingok >>>>>>>>>>>>>> nocompress >>>>>>>>>>>>>> copytruncate >>>>>>>>>>>>>> dateext >>>>>>>>>>>>>> extension .log >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I thought "copytruncate" would be good enough, but apparently it >>>>>>>>>>>>>> is not. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is it possible to send a signal to Kea so that it reopens its >>>>>>>>>>>>>> log file ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Nicolas. >> > > This message contains information that may be privileged or confidential and > is the property of the Capgemini Group. It is intended only for the person to > whom it is addressed. If you are not the intended recipient, you are not > authorized to read, print, retain, copy, disseminate, distribute, or use this > message or any part thereof. If you receive this message in error, please > notify the sender immediately and delete all copies of this message. > _______________________________________________ Kea-users mailing list [email protected] https://lists.isc.org/mailman/listinfo/kea-users
