Hello James, James B: > I've gotten this kernel oops after running the system continuously for about > 14-15 hours. > This only happens when the system is under load most of the time (load is > around 95% of CPU most of the time). > If the system load is lower (idle being 40-50% load) this doesn't happen.
Although I can guess it is the repeated mv, ls, cat, touch, etc. under aufs, would you describe more specifically about the heavy load? > Kernel source: https://github.com/SolidRun/linux-linaro-stable-mx6 > Kernel version: 3.10.30 (patched with aufs3.10.x) > Kernel arch: armv7 dual processor (SMP) - cubox-i This source tree looks different v3.10.30 and these are added at least. - ARM archetecture specific modification (arch/, drivers/, firmware/, include/, sound/ and tools/). - block device has POWER_EFFICIENT workqueue (block/). - debugfs supports atomic_t (fs/debugfs/). - F2FS_FS_SECURITY, F2FS_CHECK_FS for f2fs, but you don't have such branches (fs/f2fs/). - jffs2 awares MLC NAND (fs/jffs2/). - ZBUD, ZSWAP looks interesting. - heterogenius multiprocessor sounds interesting too... All these look unrelated to aufs. As long as ZBUD and ZSWAP are working correctly (if you enable them), the problem seems inside aufs I am afraid. > I've already compiled the kernel with DEBUG_INFO, AUFS_DEBUG, extended > BUG_ON, and DYNAMIC_PRINTK (but I don't see aufs debug statements in dmesg > although I have already echo "module aufs +p" > > /sys/kernel/debug/dynamic_debug/control at the beginning). Aufs has its own debug feature (long before dynamic_debug is introduced). You can enable the module paramater "debug" dynamically. > I have only used the first 3 aufs patches (aufs3-kbuild, aufs3-base, > aufs3-mmap). I didn't use aufs3-loopback (actually, in earlier builds, I used > that too - and got the same bug). The version of aufs used is the latest (as > of yesterday) commit for 3.10.x branch: > b72117ef0c528c4f6013845e6e704d92682fc913 And how do you mount aufs? Would you post you /proc/mounts? > My apology for the large email size: I'm attaching the oops detail at the end > of this email since I'm not sure whether the list accept attachments). No problem. > From what I can see, it is the first kernel null pointer that is the problem. > The kernel doesn't immediately crash and continues to run, and all subsequent > access to aufs fails and results in more BUG_ON later on. Agreed. Since the message is produced __do_kernel_fault() in arch/arm/mm/fault.c, it might be better to check ZBUD and ZSWAP as well as aufs_rename(). > [55096.152331] Unable to handle kernel NULL pointer dereference at virtual > address 0000003f ::: > [55096.189965] CPU: 0 PID: 31816 Comm: mv Not tainted 3.10.30 #10 > [55096.194501] task: 8e3f9c00 ti: a8cdc000 task.ti: a8cdc000 > [55096.198608] PC is at dput+0x30/0x214 > [55096.200889] LR is at aufs_rename+0x1ea4/0x1ee4 ::: > [55096.237572] Process mv (pid: 31816, stack limit = 0xa8cdc238) > [55096.242018] Stack: (0xa8cddde0 to 0xa8cde000) ::: > [55096.362071] [<800d6da8>] (dput) from [<8022f908>] > (aufs_rename+0x1ea4/0x1ee4) > [55096.367915] [<8022f908>] (aufs_rename) from [<800d059c>] > (vfs_rename+0x170/0x454) > [55096.374106] [<800d059c>] (vfs_rename) from [<800d131c>] > (SyS_renameat+0x244/0x280) > [55096.380385] [<800d131c>] (SyS_renameat) from [<8000eb00>] > (ret_fast_syscall+0x0/0x30) > [55096.386918] Code: f57ff04f e320f004 e3540000 0a000039 (e5943050) ::: I am unfamilier to this format for ARM. Anyway it tells the problem happened in dput() which is called by aufs_rename(). Do you know the meaning of "stack limit = 0xa8cdc238"? It looks out of the range of the stack shown in the next line. Is it a problem of ZBUD or ZSWAP? If you are using ZBUD or ZSWAP, if you can, please test them heavily without aufs. On my side, I will try reproducing the problem on my intel pc after knowing the detail of your workload test. J. R. Okajima ------------------------------------------------------------------------------ Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds