"Vonlanthen, Elmar":
> > > I will check the memory as well.
>
> The memory check didn't report any error, but I replaced the whole
> memory anyway.

Oh, that is not what I meant.
I was talking about "some other SOFTWARE component MAY overwrite
something on memory". It never mean the HARDWARE error.
If this misunderstanding forced you to buy a new memory (hardware), I
feel sorry.


> The kernel or any other process don't log any other problem. The only
> thing I see is "mount" which remains blocked.

Ok, understood.


> I can reproducte it again and again on two machines and even with
> replaced RAM. A new awareness is that sometimes the command is
> unblocking again (after a few minutes). But not everytime.

I have tested your "while" loop test for about an hour, but found
nothing wrong. But my system in linux-3.3-rcN instead of 3.2.1.


> Call Trace:
>  [<c102862b>] ? try_to_wake_up+0x17b/0x1f0
>  [<c1020dc0>] ? __wake_up_common+0x40/0x70
>  [<c10ac1ef>] ? iput+0x2f/0x1a0
>  [<c134b630>] schedule+0x30/0x50
>  [<f80bbdfd>] au_hn_alloc+0x24d/0x370 [aufs]
>  [<c10431a0>] ? wake_up_bit+0x60/0x60
>  [<f80bbb8c>] au_hn_free+0x1c/0x40 [aufs]
>  [<f80b36cb>] au_hiput+0xb/0x20 [aufs]
>  [<f80b380b>] au_iinfo_fin+0x12b/0x1a0 [aufs]
>  [<f80a13ab>] au_si_free+0xabb/0xc00 [aufs]
>  [<c10ac04c>] destroy_inode+0x2c/0x50
>  [<c10ac143>] evict+0xd3/0x150
>  [<c10ac27d>] iput+0xbd/0x1a0
>  [<c10aa1ef>] d_kill+0x9f/0xf0
>  [<c10aa3f0>] shrink_dentry_list+0x1b0/0x1d0
>  [<c10aa8ae>] shrink_dcache_sb+0x5e/0x90
>  [<c1098b25>] do_remount_sb+0x35/0x160
>  [<c1034110>] ? ns_capable+0x20/0x50
>  [<c10b1870>] do_mount+0x500/0x6e0
>  [<c101b840>] ? mm_fault_error+0x130/0x130
>  [<c10b0208>] ? copy_mount_options+0x98/0x110
>  [<c10b1ab6>] sys_mount+0x66/0xa0
>  [<c134d065>] syscall_call+0x7/0xb

This call trace may be unreliable too.
You can see destroy_inode() calling au_si_free() as well as au_hn_free()
calling au_hn_alloc(), but there is no such calls in the source files.
Look at the VFS function destroy_inode() in linux/fs/inode.c, and you
can find what I mean.


> You say, that au_hn_alloc() cannot follow au_hn_free(). But how can it
> be, that can reproduce it again and again? Which code is executing the
> function "wake_up_bit()"?

It is prefixed by '?' in the call trace which means that the address is
unreliable. In other words, if the function name is not prefixed by '?',
it is reliable.


> Is there everything else I can try?

- MagicSysRq + A
  or
- set the aufs module parameter 'debug' to 1, just before the hang mount

But I am afraid they may not help, since your call trace looks
unbelievalbe to me.

Another option is modifying aufs-util/mount.aufs.c.
Arount the line 223, you will see
                flags[AuFlush] = test_flush(opts);
                if (flags[AuFlush] /* && !flags[Fake] */) {
                        err = au_plink(cwd, AuPlink_FLUSH,
                                       AuPlinkFlag_OPEN | AuPlinkFlag_CLOEXEC,
                                       &fd);
                        if (err)
                                AuFin(NULL);
                }

In your case, the function au_plink() is not called. But calling it may
break the current situation. So I'd suggest you to set flags[AuFlush] to
1 regardless the return value from test_flush().


J. R. Okajima

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

Reply via email to