On 24/11/15 12:59, OmegaPhil wrote: > On 24/11/15 05:27, sf...@users.sourceforge.net wrote: >> OmegaPhil: >> ::: >>> repos, so I'm using this - currently the rsync init.d script has been >>> edited to export the right LD_PRELOAD and LIBAU values, and I've >>> confirmed the library has been loaded via: >>> >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= >>> =3D=3D=3D >>> >>> lsof -a -c rsync +D /usr/lib >>> >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= >>> =3D=3D=3D >>> >>> Looking at kern logs the problem happened twice so far this month and 4 >>> times in October, so a month's test should demonstrate this working. >> >> You don't have to wait a month. >> You already have a large dir. Try reproducing the problem by "ls >> /large_dir", "rsync --dry-run ..." or something. And then confirm >> setting LD_PRELOAD and LIBAU solves the problem. >> This simple test has an effect to detect something wrong in setting >> LD_PRELOAD and LIBAU (hopefully). >> I'd suggest you to try a simple test. Why? Because I have ever made >> mistakes in setting LD_PRELOAD and LIBAU. :-) >> >> >> J. R. Okajima > > > Ha, that would be nice - the problem is intermittent, only in the most > serious case (where ls was also affected) could it be reliably repeated > (but that situation has long since been cleaned up, haven't had it as > bad since). So saying that it failed twice in a month means that the > rsync daily backup worked ~28 times in the same period (for reference > the backup will cover ~4.3 million files/directories according to > locate). No issues so far with the current setup. > > I'm happy to sit and wait for the real issue to crop up, if it does then > I can be more aggressive with a test case. > > Unless you want me to try and force the issue?
It has now been some time since I got the kernel memory allocation failures, so clearly the libau hack has fixed it - thanks. In the manpage, please can you change 'If you have a directory which has millions of files' to say 'tens of thousands of files', and it would be useful to mention 'page allocation failure' somehow so that its easy for others to search on (the problem affects programs interacting with aufs resulting in that message in the syslog, its not obvious what it means/who is responsible etc). Ironically I now have a separate issue with rsync running in daemon mode, which appears to be due to using libau: ==================================================================== rsync: readdir("/omega1-storage-4/." (in backups)): Invalid argument (22) ==================================================================== Almost every start of an rsync operation fails with this (presumably reading the base directory of the rsync-shared location immediately fails) - commenting out the libau stuff in the '/etc/init.d/rsync' script gets rid of the problem, but naturally I then hit up against the memory issues. It just affects aufs volumes. This appears to have happened after I upgraded the kernel to v4.3.3-5, and aufs at the same time (v4.3-20160111) - I was running off the aufs-tools package from Debian (1:3.2+20130722-1.1), so I built and installed my own aufs-util package from the latest source, however the problem still occurs. rsync hasn't changed since 7.03.15 (v3.1.1-3), and only using it as a daemon (i.e. with the rsync protocol) does the failure trigger - rsync over SSH works fine. Has anyone else had such problems?
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140