Thanks for the interesting puzzle to crunch! Looking at these few bread-crumbs, I wager an educated guess that this loops in `sendcmd()` (where CLI child processes talk to a daemonized copy which tracks the timers for events), around here: https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L583-L626 for the daemon and here for the child: https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L762-L787
In both cases we run an infinite loop with `select()` to wait up to 1 second and `read()` from the pipe. The common expectation is that if there is nothing to read, the program blocks in `select()` waiting for a reply from the kernel, and only returns after the specified timeout - to the point that some `sleep()` with sub-second precision implementations do just that on systems that lack `usleep()`. With `select()` in your capture posting an array of FDs `[7 10]` I'd say it is the parent loop, and one of those is the `pipefd` and another is the FD for socket connection to talk to a particular child on the other side of the pipe. Here I must guess that either the `pselect6()` implementation behaves differently which would be surprising after decades worth of expected behavior, or that it for some reason decides that the FD #7 (whoever that is) raised the hand but was then silent (0 bytes read). The next clue is 128 reads, which leads to https://github.com/networkupstools/nut/blob/v2.8.0/clients/upssched.c#L485 `sock_read()` method which indeed does up to `US_MAX_READ==128` attempts to read one byte from a connection (so from the CLI/child counterpart) at a time - so that is who FD#7 gotta be. So it is just at this point that I get stuck explaining what happens and why :) Possibly there was a child or CLI counterpart process which died and somehow the connection did not get closed (which would return a `read()` error and evict it from the conn list), e.g. frozen/zombified, or something else connected to the pipe (antivirus heh?) and loiters there, or something else I could not think of quickly... The next clue could be to look at who is connected to the pipe file, maybe with `fuser`, or look in `ps -ef` if there are other `upssched` processes present, or look in `/proc/PIDNUM/fd/` of the daemonized and looping `upssched`... If anyone decides to track through the code, the potential processing loop goes checkconf -> conf_arg -> parse_at -> sendcmd -> daemon if needed -> start_daemon -> conn_add(pipefd) or sock_read(conn) with short-lived child processes (command-line launched from `upsmon`) parsing the configuration to pick the part relevant to them per envvars, and talk to the daemon (fork one if needed) to start/cancel the timers. Jim On Tue, Jun 13, 2023 at 8:33 AM Kari Lempiäinen < [email protected]> wrote: > Hi, > > I ran the strace command while upssched was 100% CPU hungry. This is what > I got: > > 1686633611.702798 read(7, "", 1) = 0 <0.000004> > 1686633611.702816 read(7, "", 1) = 0 <0.000004> > 1686633611.702834 pselect6(11, [7 10], NULL, NULL, {tv_sec=1, tv_nsec=0}, > NULL) = 1 (in [7], left {tv_sec=0, tv_nsec=999998800}) <0.000006> > 1686633611.702862 read(7, "", 1) = 0 <0.000004> > 1686633611.702880 read(7, "", 1) = 0 <0.000004> > . > . > . > 1686633611.705206 read(7, "", 1) = 0 <0.000004> > 1686633611.705224 read(7, "", 1) = 0 <0.000004> > 1686633611.705242 pselect6(11, [7 10], NULL, NULL, {tv_sec=1, tv_nsec=0}, > NULL) = 1 (in [7], left {tv_sec=0, tv_nsec=999999204}) <0.000005> > 1686633611.705268 read(7, "", 1) = 0 <0.000004> > 1686633611.705288 read(7, "", 1) = 0 <0.000302> > > there are 128 "read(7" lines between every pselect6 lines (I did count the > read lines for couple of blocks, 128 in each of them). Nothing else in the > strace log, just read and pselect lines for the short time I let strace to > run. > > Any help from this? > > Best regards, > Kari > ------------------------------ > > *<more history snapped>* >
_______________________________________________ Nut-upsdev mailing list [email protected] https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsdev
