Package: apt-cacher-ng Version: 3.7.4-1 Since quite many months I see occasionally (~ once a month) this cron job warning: /etc/cron.daily/apt-cacher-ng: Aborted run-parts: /etc/cron.daily/apt-cacher-ng exited with return code 134
I originally assumed it was because of missing large-file-support (see bug #1020909), but it turns out to be some kind of time-out. To find the cause of this error I modified /etc/cron.daily/apt-cacher-ng like this and had to wait some weeks until the error occured: ---------------------- echo "Debug info added to /etc/cron.daily/apt-cacher-ng" echo "Helge added debug info to /etc/cron.daily/apt-cacher-ng to find cause for this: /etc/cron.daily/apt-cacher-ng exited with return code 134" if [ -d /var/log/apt-cacher-ng ]; then echo "clean dir: /var/log/apt-cacher-ng/" ls -la /var/log/apt-cacher-ng/ echo "INODES" ls -li /var/log/apt-cacher-ng/ echo "FIND 1:" # rm older than 10 days find /var/log/apt-cacher-ng -maxdepth 1 -name 'maint_*.log.html.xz' -or -name 'maint_*.log.html' -type f -user apt-cacher-ng -mtime +10 -delete echo "XZ 1:" # compress older than 5 days find /var/log/apt-cacher-ng -maxdepth 1 -name 'maint_*.log.html' -type f -user apt-cacher-ng -mtime +5 -print0 | xargs -r0 xz echo "clean dir done" fi rpage=$(/usr/lib/apt-cacher-ng/acngtool -c /etc/apt-cacher-ng 2>/dev/null printvar ReportPage) echo "rpage is: $rpage" if test 0 = $(echo "$rpage" | wc -w) then echo Warning: ReportPage setting of apt-cacher-ng was disabled by administrator. >&2 echo See /usr/share/doc/apt-cacher-ng/README.Debian for details. >&2 exit 42 fi echo "Now start scngtool maint:" strace -f /usr/lib/apt-cacher-ng/acngtool maint -c /etc/apt-cacher-ng SocketPath=/var/run/apt-cacher-ng/socket 2>&1 echo "done." ------------------------ The important part is the added "strace" at the end to see what "/usr/lib/apt-cacher-ng/acngtool maint" is doing. I've added the full output of this faulty run to this bug report. I think the interesting part is this strace output: ... [pid 4533] read(9, "<b>Bringing index files up to da"..., 30000) = 46 [pid 4533] pselect6(10, [7 9], [], NULL, {tv_sec=17, tv_nsec=23000}, NULL) = 1 (in [9], left {tv_sec=17, tv_nsec=23000}) [pid 4533] read(9, "\r\n", 30000) = 2 [pid 4533] pselect6(10, [7 9], [], NULL, {tv_sec=17, tv_nsec=23000}, NULL) = 0 (Timeout) [pid 4533] shutdown(9, SHUT_RDWR) = 0 [pid 4533] close(9) = 0 ... It seems acngtool receives "Bringing index files up to date" (?) and then it times out ofter ~17 seconds and kills the connection and afterwards is killed with SIGABRT: ... [pid 4554] tgkill(4526, 4554, SIGABRT) = 0 [pid 4554] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=4526, si_uid=0} --- [pid 4526] <... futex resumed>) = ? [pid 4554] +++ killed by SIGABRT +++ +++ killed by SIGABRT +++ ... My assumption is, that the ~ 17 seconds wait time is too short. This machine is a debian buildd server for the hppa architecture. It's much slower than x86 machines and sometimes has a high CPU load when building packages. This can lead to slowly running cron jobs which then probably triggers this kind of errors. I briefly looked at the source code of apt-cacher-ng, and it seems many timeouts can be configured in /etc/apt-cacher-ng/acng.conf. Do you have a suggestion which of those is relevant for this cron job? Could it be the nettimeout(17) as from src/acfg_defaults.cc, which can be configured with the "NetworkTimeout" parameter? If so, does it make sense to increase that value by default? Another question would be: If it really turns out to be a time-out issue, would it be possible to print some better warning or error message instead of just SIGABRT ? Thanks, Helge
apt-cacher-ng-trace.log.gz
Description: application/gzip