"Vonlanthen, Elmar": > > > I will check the memory as well. > > The memory check didn't report any error, but I replaced the whole > memory anyway.
Oh, that is not what I meant. I was talking about "some other SOFTWARE component MAY overwrite something on memory". It never mean the HARDWARE error. If this misunderstanding forced you to buy a new memory (hardware), I feel sorry. > The kernel or any other process don't log any other problem. The only > thing I see is "mount" which remains blocked. Ok, understood. > I can reproducte it again and again on two machines and even with > replaced RAM. A new awareness is that sometimes the command is > unblocking again (after a few minutes). But not everytime. I have tested your "while" loop test for about an hour, but found nothing wrong. But my system in linux-3.3-rcN instead of 3.2.1. > Call Trace: > [<c102862b>] ? try_to_wake_up+0x17b/0x1f0 > [<c1020dc0>] ? __wake_up_common+0x40/0x70 > [<c10ac1ef>] ? iput+0x2f/0x1a0 > [<c134b630>] schedule+0x30/0x50 > [<f80bbdfd>] au_hn_alloc+0x24d/0x370 [aufs] > [<c10431a0>] ? wake_up_bit+0x60/0x60 > [<f80bbb8c>] au_hn_free+0x1c/0x40 [aufs] > [<f80b36cb>] au_hiput+0xb/0x20 [aufs] > [<f80b380b>] au_iinfo_fin+0x12b/0x1a0 [aufs] > [<f80a13ab>] au_si_free+0xabb/0xc00 [aufs] > [<c10ac04c>] destroy_inode+0x2c/0x50 > [<c10ac143>] evict+0xd3/0x150 > [<c10ac27d>] iput+0xbd/0x1a0 > [<c10aa1ef>] d_kill+0x9f/0xf0 > [<c10aa3f0>] shrink_dentry_list+0x1b0/0x1d0 > [<c10aa8ae>] shrink_dcache_sb+0x5e/0x90 > [<c1098b25>] do_remount_sb+0x35/0x160 > [<c1034110>] ? ns_capable+0x20/0x50 > [<c10b1870>] do_mount+0x500/0x6e0 > [<c101b840>] ? mm_fault_error+0x130/0x130 > [<c10b0208>] ? copy_mount_options+0x98/0x110 > [<c10b1ab6>] sys_mount+0x66/0xa0 > [<c134d065>] syscall_call+0x7/0xb This call trace may be unreliable too. You can see destroy_inode() calling au_si_free() as well as au_hn_free() calling au_hn_alloc(), but there is no such calls in the source files. Look at the VFS function destroy_inode() in linux/fs/inode.c, and you can find what I mean. > You say, that au_hn_alloc() cannot follow au_hn_free(). But how can it > be, that can reproduce it again and again? Which code is executing the > function "wake_up_bit()"? It is prefixed by '?' in the call trace which means that the address is unreliable. In other words, if the function name is not prefixed by '?', it is reliable. > Is there everything else I can try? - MagicSysRq + A or - set the aufs module parameter 'debug' to 1, just before the hang mount But I am afraid they may not help, since your call trace looks unbelievalbe to me. Another option is modifying aufs-util/mount.aufs.c. Arount the line 223, you will see flags[AuFlush] = test_flush(opts); if (flags[AuFlush] /* && !flags[Fake] */) { err = au_plink(cwd, AuPlink_FLUSH, AuPlinkFlag_OPEN | AuPlinkFlag_CLOEXEC, &fd); if (err) AuFin(NULL); } In your case, the function au_plink() is not called. But calling it may break the current situation. So I'd suggest you to set flags[AuFlush] to 1 regardless the return value from test_flush(). J. R. Okajima ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d