Thanks you for looking into this. On Tue, 15 Jul 2014 14:18:09 +0900 sf...@users.sourceforge.net wrote:
> > Although I can guess it is the repeated mv, ls, cat, touch, etc. under > aufs, would you describe more specifically about the heavy load? > Sorry for the misleading subject title :( The heavy load is mainly software-decoding of video playback. The filesystem operations are indeed not heavy; I have a few cronjobs that performs various sanity checks by doing those mv/ls/cat/touch etc and they will run at the same frequency whether the CPU is loaded or not. When the CPU is not loaded the kernel can last longer (so far I have tested up to 2 days). I will test more. > This source tree looks different v3.10.30 and these are added at least. > - ARM archetecture specific modification (arch/, drivers/, firmware/, > include/, sound/ and tools/). > - block device has POWER_EFFICIENT workqueue (block/). > - debugfs supports atomic_t (fs/debugfs/). > - F2FS_FS_SECURITY, F2FS_CHECK_FS for f2fs, but you don't have such You are right, I don't enable f2fs filesystem. > branches (fs/f2fs/). > - jffs2 awares MLC NAND (fs/jffs2/). > - ZBUD, ZSWAP looks interesting. > - heterogenius multiprocessor sounds interesting too... The particular CPU I'm running does not have heterogeneous multiprocessing, it is just a regular SMP (2x Cortex-A9). > > All these look unrelated to aufs. > As long as ZBUD and ZSWAP are working correctly (if you enable them), > the problem seems inside aufs I am afraid. Indeed, it is far from plain vanilla, but so is many kernel used in the ARM arena, unfortunately :( I may have the chance to use (near-)vanilla kernel in 3.16 later - depending on how much stuff they can push to mainline. ZBUD and ZSWAP are not enabled in my config (which is attached). > > Aufs has its own debug feature (long before dynamic_debug is > introduced). You can enable the module paramater "debug" dynamically. Thanks, I didn't notice this before. I will activate this debug switch and hopefully I can supply you with more info. EDIT: It seems to generate huge amount of information, I'm not sure whether that will be useful for you. > > And how do you mount aufs? Would you post you /proc/mounts? Contents of /proc/mounts: rootfs / rootfs rw 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,relatime 0 0 devtmpfs /dev devtmpfs rw,relatime,size=371268k,nr_inodes=92817,mode=755 0 0 tmpfs /aufs/pup_init tmpfs ro,relatime 0 0 /dev/loop0 /aufs/kernel-modules squashfs ro,relatime 0 0 /dev/mmcblk0p1 /aufs/devbase vfat rw,relatime,gid=500,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,quiet,utf8,errors=remount-ro 0 0 /dev/loop1 /aufs/pup_ro squashfs ro,relatime 0 0 /dev/mmcblk0p1 /aufs/devsave vfat rw,relatime,gid=500,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,quiet,utf8,errors=remount-ro 0 0 /dev/loop2 /aufs/pup_save ext4 rw,relatime,data=ordered 0 0 tmpfs /aufs/pup_rw tmpfs rw,relatime 0 0 aufs / aufs rw,relatime,si=5af06edd 0 0 devpts /dev/pts devpts rw,relatime,gid=3,mode=620 0 0 tmpfs /dev/shm tmpfs rw,relatime,mode=777 0 0 tmpfs /tmp tmpfs rw,relatime 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 /dev/loop3 /mnt/storage ext3 rw,relatime,data=ordered 0 0 > > > > From what I can see, it is the first kernel null pointer that is the > > problem. The kernel doesn't immediately crash and continues to run, and all > > subsequent access to aufs fails and results in more BUG_ON later on. > > Agreed. > Since the message is produced __do_kernel_fault() in > arch/arm/mm/fault.c, it might be better to check ZBUD and ZSWAP as well > as aufs_rename(). > As above, I actually have both ZBUD and ZSWAP disabled. > > I am unfamilier to this format for ARM. Anyway it tells the problem > happened in dput() which is called by aufs_rename(). > Do you know the meaning of "stack limit = 0xa8cdc238"? I am not sure myself, I would thought that would be the per-process stack size or the bottom of the the stack. FYI the kernel is configured for 2G/2G split (instead of 3G/1G). > It looks out of the range of the stack shown in the next line. Is it a > problem of ZBUD or ZSWAP? > > If you are using ZBUD or ZSWAP, if you can, please test them heavily > without aufs. On my side, I will try reproducing the problem on my intel > pc after knowing the detail of your workload test. Both of them are disabled. Note that I don't use swap in the running system and the system doesn't also seem to be running out of memory - but if you suggest that I use swap I will do it (the "storage" of the device is a flash-memory device, so it's not ideal to run swap there, but if it is what it takes, I'll do it). Let me know if there is anything else I can help from my end. cheers! -- James B <jamesbond3...@gmail.com> ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds