We are facing the same problem under Ubuntu 22.04 which have been reported by 
Thomas.

A colleague was able to get a blocking strace:

$ strace reprepro update
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "data/apb/tests/test.yml.j2\n13170"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "galaxy/data/network/tests/test.y"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "sic.cs\n490a99c61905558eaa4d94e0c"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "dule_utils/facts/network/nvme.py"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "/module_utils/facts/virtual/sysc"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "-packages/ansible/modules/expect"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, " usr/lib/python3/dist-packages/a"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "3/dist-packages/ansible/playbook"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "packages/ansible/plugins/callbac"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "b099d4bd46482713679e6ec6db7 usr"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "rminal/__init__.py\n33c60e3c9f057"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "st-packages/ansible/vars/hostvar"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "ons/amazon/aws/docs/amazon.aws.e"..., 4096) = 4096
poll([{fd=15, events=POLLIN}], 1, 0) = 1 ([{fd=15, revents=POLLIN}])
read(15, "count_attribute.py\nc0309eee8189f"..., 4096) = 2048
poll([{fd=15, events=POLLIN}], 1, 0) = 0 (Timeout)
wait4(88839,

The call seems to be blocking on the method "poll" in the newly introduced 
static method "drain_pipe_fd" in uncompression.c. We have modified the method's 
implementation and used:

static inline retvalue drain_pipe_fd(struct compressedfile *file, int *errno_p, 
const char **msg_p) {
    int e = 0;
    struct pollfd pollfd = {
        file->fd,
        POLLIN,
        0
    };
    unsigned char buffer[4096] = {};
    while ((e = poll(&pollfd, 1, 0)) >= 0) {
        if (pollfd.revents & POLLERR || pollfd.revents & POLLHUP)
        return RET_ERRNO(file->error);

        e = read(file->fd, buffer, 4096);
        if (e <= 0)
            break;
    }
    if (e < 0) {
        *errno_p = e;
        *msg_p = strerror(file->error);
        return RET_ERRNO(e);
    }
    return RET_OK;
}

Now, the update process works without any errors and hanging processes (tested 
on 5.4.2). We tried to mirror the official repositories for Ubuntu Focal, Jammy 
and Noble, but we got some "Too many open files" errors during exporting. We 
have increased the open file limit and modified the surrounding code, in order 
to ensure that the method cannot be exited without closing the file handle:

if (file->pipeinfd != -1)
    (void)close(file->pipeinfd);
output_fd = file->fd;
// Drain the child's stdout in the unlikely case it's blocking on it
e = drain_pipe_fd(file, errno_p, msg_p);
if (e != RET_OK) {
    (void)close(output_fd);
    return e;
}

I must admit, that my colleague and I are no C++ experts and are not sure, if 
this fix works well under productive conditions. We hope that it helps to 
narrow down the problem and to release a fix soon.

Regards,
Christoph

Reply via email to