Hello,

TL;DR: we experience the same bug using backuppc, and found upstream rsync
commit f9bb8f76ee is the culprit.  Unfortunately the bug only occurs on
some debian 11 machines, not all.

We are seeing the same problem when using backuppc to backup a debian 11
machine: the backup never finishes, because the rsync process on the target
machine ends up doing nothing at all (not using any CPU).  Rsync on the
target machine is version 3.2.3-4+deb11u1 (standard debian package).

Debugging with strace on the target debian 11 machine, rsync scans the
whole filesystem as expected, but then it gets stuck doing nothing in a
select loop, forever:

    strace: Process 4104384 attached
    select(1, [0], [], [0], {tv_sec=52, tv_usec=637639}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)
    select(1, [0], [], [0], {tv_sec=60, tv_usec=0}) = 0 (Timeout)

Given this behaviour, it seems likely that rsync ends up running the
"noop_io_until_death()" function and loops there forever.

Thanks to a previous comment from Andre, I focused on the git history
between 3.2.2 and 3.2.3 and found this commit:

    f9bb8f76ee ("Change daemon variable & simplify some option code")

and specifically this change that introduces a new condition to keep
running noop_io_until_death():

  
https://github.com/WayneD/rsync/commit/f9bb8f76ee728bd1391a2b4890ce0281457a7bf2#diff-92194f057884b3287a3a6bf84e6e3b2bf433a556b68562799252a091744e7854L920-L921

Reverting f9bb8f76ee on top of the Debian package indeed fixes the issue
for us on this machine.

The strange thing is: on another debian 11 machine that is configured very
similarly (same rsync version, same backuppc server) but has different
data, we could not reproduce the bug at all.

I'm not familiar with rsync internals, but my guess from reading the code is:

- for some reason, possibly due to the way rsync is started by backuppc,
  daemon_connection is sometimes set to -1 ("daemon via socket")

- when rsync has finished working, it goes into the noop_io_until_death()
  loop because of the new condition in 3.2.3, waiting for a signal

- but the signal never comes, so rsync never exits

- this causes backuppc to wait forever for rsync to finish

For reference, this is how rsync is started by backupppc through SSH:

    /usr/bin/ssh -q -x -o StrictHostKeyChecking=no -l backup $CLIENT_HOSTNAME 
nice -n 19 sudo /usr/bin/rsync --server --sender --numeric-ids --perms --owner 
--group -D --links --hard-links --times --block-size=2048 --recursive 
--checksum-seed=32761 . /

I have opened an upstream bug report here: 
https://github.com/WayneD/rsync/issues/256

Thanks,
Baptiste

Reply via email to