James B:
> The filesystem operations are indeed not heavy; I have a few cronjobs that 
> performs various sanity checks by doing those mv/ls/cat/touch etc and they 
> will run at the same frequency whether the CPU is loaded or not. When the CPU 
> is not loaded the kernel can last longer (so far I have tested up to 2 days). 
> I will test more.

Ok.
If you can, I'd ask you to try testing without aufs, repeat mv under
heavy decoding workload.


> Thanks, I didn't notice this before. I will activate this debug switch and 
> hopefully I can supply you with more info.
> EDIT: It seems to generate huge amount of information, I'm not sure whether 
> that will be useful for you.

In this case, this approach may be more effective.
- insert this just before every dput() in aufs_rename().
        au_debug_on();
        AuDbgDentry(d);
        au_debug_off();
        dput(d);
- note that au_debug_on() is equivalent to set 1 to the module parameter
  "debug." So during in this short window unrelated debug messages from
  other processes can be printed.


> Contents of /proc/mounts: 
        :::

I guess you mounted aufs in initramfs and did switch_root/chroot, right?
If so, did you "mount --move" the branches before switch_root?


> I am not sure myself, I would thought that would be the per-process stack 
> size or the bottom of the the stack. FYI the kernel is configured for 2G/2G 
> split (instead of 3G/1G).

What is the size of stack? 4K or 8K?


Now I begin thinking the problem may exist outside aufs. The reasons
are,
- the address in your log is 0000003f. this is really strange. even if
  aufs_rename() passed NULL to dput(), it cannot cause any
  problem. dput() simply returns immediately.
- generally the structure and its members are aligned. 0000003f should
  not happen. but this is highly depending upon your machine and I am
  not sure such alignment is valid.
- if aufs_rename() is totally crazy and passed 0x1 or something to dput,
  then such message can happen (maybe). but in this case another problem
  will appear earlier I guess.

Anyway putting AuDbgDentry() before dput() will detect the problem a
little earlier and we may be able to investigate more. Please try it.
This case, I cannot reproduce the problem on my side unfortunately.


J. R. Okajima

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds

Reply via email to