Thanks you for looking into this.

On Tue, 15 Jul 2014 14:18:09 +0900
sf...@users.sourceforge.net wrote:

> 
> Although I can guess it is the repeated mv, ls, cat, touch, etc. under
> aufs, would you describe more specifically about the heavy load?
> 

Sorry for the misleading subject title :(

The heavy load is mainly software-decoding of video playback. 

The filesystem operations are indeed not heavy; I have a few cronjobs that 
performs various sanity checks by doing those mv/ls/cat/touch etc and they will 
run at the same frequency whether the CPU is loaded or not. When the CPU is not 
loaded the kernel can last longer (so far I have tested up to 2 days). I will 
test more.

> This source tree looks different v3.10.30 and these are added at least.
> - ARM archetecture specific modification (arch/, drivers/, firmware/,
>   include/, sound/ and tools/).
> - block device has POWER_EFFICIENT workqueue (block/).
> - debugfs supports atomic_t (fs/debugfs/).
> - F2FS_FS_SECURITY, F2FS_CHECK_FS for f2fs, but you don't have such

You are right, I don't enable f2fs filesystem.

>   branches (fs/f2fs/).
> - jffs2 awares MLC NAND (fs/jffs2/).
> - ZBUD, ZSWAP looks interesting.
> - heterogenius multiprocessor sounds interesting too...

The particular CPU I'm running does not have heterogeneous multiprocessing, it 
is just a regular SMP (2x Cortex-A9). 

> 
> All these look unrelated to aufs.
> As long as ZBUD and ZSWAP are working correctly (if you enable them),
> the problem seems inside aufs I am afraid.

Indeed, it is far from plain vanilla, but so is many kernel used in the ARM 
arena, unfortunately :( 
I may have the chance to use (near-)vanilla kernel in 3.16 later - depending on 
how much stuff they can push to mainline.
ZBUD and ZSWAP are not enabled in my config (which is attached).

> 
> Aufs has its own debug feature (long before dynamic_debug is
> introduced). You can enable the module paramater "debug" dynamically.

Thanks, I didn't notice this before. I will activate this debug switch and 
hopefully I can supply you with more info.
EDIT: It seems to generate huge amount of information, I'm not sure whether 
that will be useful for you.

> 
> And how do you mount aufs? Would you post you /proc/mounts?

Contents of /proc/mounts: 

rootfs / rootfs rw 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=371268k,nr_inodes=92817,mode=755 0 0
tmpfs /aufs/pup_init tmpfs ro,relatime 0 0
/dev/loop0 /aufs/kernel-modules squashfs ro,relatime 0 0
/dev/mmcblk0p1 /aufs/devbase vfat 
rw,relatime,gid=500,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,quiet,utf8,errors=remount-ro
 0 0
/dev/loop1 /aufs/pup_ro squashfs ro,relatime 0 0
/dev/mmcblk0p1 /aufs/devsave vfat 
rw,relatime,gid=500,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,quiet,utf8,errors=remount-ro
 0 0
/dev/loop2 /aufs/pup_save ext4 rw,relatime,data=ordered 0 0
tmpfs /aufs/pup_rw tmpfs rw,relatime 0 0
aufs / aufs rw,relatime,si=5af06edd 0 0
devpts /dev/pts devpts rw,relatime,gid=3,mode=620 0 0
tmpfs /dev/shm tmpfs rw,relatime,mode=777 0 0
tmpfs /tmp tmpfs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
/dev/loop3 /mnt/storage ext3 rw,relatime,data=ordered 0 0


> 
> 
> > From what I can see, it is the first kernel null pointer that is the 
> > problem. The kernel doesn't immediately crash and continues to run, and all 
> > subsequent access to aufs fails and results in more BUG_ON later on.
> 
> Agreed.
> Since the message is produced __do_kernel_fault() in
> arch/arm/mm/fault.c, it might be better to check ZBUD and ZSWAP as well
> as aufs_rename().
> 

As above, I actually have both ZBUD and ZSWAP disabled.

> 
> I am unfamilier to this format for ARM. Anyway it tells the problem
> happened in dput() which is called by aufs_rename().
> Do you know the meaning of "stack limit = 0xa8cdc238"?

I am not sure myself, I would thought that would be the per-process stack size 
or the bottom of the the stack. FYI the kernel is configured for 2G/2G split 
(instead of 3G/1G).

> It looks out of the range of the stack shown in the next line. Is it a
> problem of ZBUD or ZSWAP?
> 
> If you are using ZBUD or ZSWAP, if you can, please test them heavily
> without aufs. On my side, I will try reproducing the problem on my intel
> pc after knowing the detail of your workload test.

Both of them are disabled. 
Note that I don't use swap in the running system and the system doesn't also 
seem to be running out of memory - but if you suggest that I use swap I will do 
it (the "storage" of the device is a flash-memory device, so it's not ideal to 
run swap there, but if it is what it takes, I'll do it).

Let me know if there is anything else I can help from my end.

cheers!

-- 
James B <jamesbond3...@gmail.com>

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds

Reply via email to