On 20/01/16 22:01, [email protected] wrote: > OmegaPhil: >> It has now been some time since I got the kernel memory allocation >> failures, so clearly the libau hack has fixed it - thanks. > > Glad to hear that! > (Honestly speaking, I totally forgot about this issue) > > >> In the manpage, please can you change 'If you have a directory which has >> millions of files' to say 'tens of thousands of files', and it would be >> useful to mention 'page allocation failure' somehow so that its easy for > ::: > > How about the attached diff?
The diff looks good, however for normal users it might be useful to
force them to think 'syslog', since normal programs will probably throw
a useless generic 'I/O error':
'You may meet "out of memory" message or "page allocation failure" due
to the memory fragmentation or real starvation'
V
'A program using the directory may throw an "out of memory" error and/or
the kernel may output a "page allocation failure" associated with the
program in the syslog, due to memory fragmentation or real starvation'
>> rsync: readdir("/omega1-storage-4/." (in backups)): Invalid argument (22)=
>
> Hmm, won't you investigate it a little more?
> - which systemcall returned EINVAL(22)?
> - what parameter did rsync pass to the systemcall (or readdir)?
>
> And is your $LIBAU set to "all"?
I did look into it on the rsync side, didn't look useful - see
https://download.samba.org/pub/unpacked/rsync/flist.c:send_directory,
the readdir is called on line 1739, with the error reported on 1771.
Suddenly the VM doesn't error anymore in the particular test I set up,
so back on the server, I fiddled with the rsync init.d script and ran
the daemon via 'strace -fv'. One EINVAL hit in the resulting file, here
is it with some context:
=======================================================================
[pid 1293] stat("/omega1-home/", {st_dev=makedev(0, 34), st_ino=273972,
st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12,
st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0
[pid 1293] chdir("/omega1-home/") = 0
[pid 1293] socketpair(PF_LOCAL, SOCK_STREAM, 0, [4, 6]) = 0
[pid 1293] fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
[pid 1293] fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 1293] fcntl(6, F_GETFL) = 0x2 (flags O_RDWR)
[pid 1293] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 1293] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f7b499149d0) = 1294
Process 1294 attached
[pid 1293] close(6 <unfinished ...>
[pid 1294] set_robust_list(0x7f7b499149e0, 24 <unfinished ...>
[pid 1293] <... close resumed> ) = 0
[pid 1294] <... set_robust_list resumed> ) = 0
[pid 1293] lstat(".", <unfinished ...>
[pid 1294] close(4 <unfinished ...>
[pid 1293] <... lstat resumed> {st_dev=makedev(0, 34), st_ino=273972,
st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12,
st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0
[pid 1294] <... close resumed> ) = 0
[pid 1293] openat(AT_FDCWD, ".",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC <unfinished ...>
[pid 1294] select(6, [5], [], [5], {60, 0} <unfinished ...>
[pid 1293] <... openat resumed> ) = 6
[pid 1293] brk(0x564d6bf68000) = 0x564d6bf68000
[pid 1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096,
f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0,
f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0
[pid 1293] ioctl(6, _IOC(_IOC_READ|_IOC_WRITE, 0x41, 0x00, 0x40),
0x7ffc3f394810) = -1 EINVAL (Invalid argument)
[pid 1293] sendto(3, "<28>Jan 21 19:11:31 rsyncd[1293]"..., 103,
MSG_NOSIGNAL, NULL, 0) = 103
[pid 1293] fstatfs(6, {f_type=0x61756673, f_bsize=4096,
f_blocks=3418641366, f_bfree=652846041, f_bavail=649635649, f_files=0,
f_ffree=0, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0
[pid 1293] futex(0x7f7b48b3d0a8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 1293] close(6) = 0
[pid 1293] lstat(".", {st_dev=makedev(0, 34), st_ino=273972,
st_mode=S_IFDIR|0755, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=0, st_size=66, st_atime=2016/01/20-20:44:12,
st_mtime=2014/09/13-12:11:23, st_ctime=2015/01/07-07:46:30}) = 0
=======================================================================
After lstating '.', rsync appears to go on and lstat the subdirectories.
I'm guessing that due to the failure being an ioctl call, it didn't
appear in the usual '-e trace=file' invocation?
>> This appears to have happened after I upgraded the kernel to v4.3.3-5,
>
> Is this version debian kernel pkg's?
> According to your post in last year, your system is
> 4.2.0-1-amd64 #1 SMP Debian 4.2.5-1 (2015-10-27) x86_64
> GNU/Linux - Debian Testing standard kernel.
>
> If this problem is specific to debian v4.3.3-5 kernel, then I will try
> finding the changes made in
> 1. vanilla v4.3.3
> 2. debian v4.3.3-5
> particulary around ioctl(2).
Just confirmed, on this kernel the setup is fine:
=======================================================================
Linux 4.2.0-1-amd64 #1 SMP Debian 4.2.6-1 (2015-11-10) x86_64 GNU/Linux
=======================================================================
On this it breaks:
=======================================================================
Linux 4.3.0-1-amd64 #1 SMP Debian 4.3.3-5 (2016-01-04) x86_64 GNU/Linux
=======================================================================
Yes these are stock Debian kernels - the only special compilation I do
is your standalone aufs driver (there are some DKMS modules mind).
Thanks
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
