Hello

> > > > I will check the memory as well.
> >
> > The memory check didn't report any error, but I replaced the whole
> > memory anyway.
> 
> Oh, that is not what I meant.
> I was talking about "some other SOFTWARE component MAY overwrite
> something on memory". It never mean the HARDWARE error.
> If this misunderstanding forced you to buy a new memory (hardware), I
> feel sorry.

No problem. I had the memory already. I misunderstood you.

> > I can reproducte it again and again on two machines and even with
> > replaced RAM. A new awareness is that sometimes the command is
> > unblocking again (after a few minutes). But not everytime.
> 
> I have tested your "while" loop test for about an hour, but found
> nothing wrong. But my system in linux-3.3-rcN instead of 3.2.1.

I have also systems, where I am unable to reproduce the error. It's very
strange.

> > Call Trace:
> >  [<c102862b>] ? try_to_wake_up+0x17b/0x1f0
> >  [<c1020dc0>] ? __wake_up_common+0x40/0x70
> >  [<c10ac1ef>] ? iput+0x2f/0x1a0
> >  [<c134b630>] schedule+0x30/0x50
> >  [<f80bbdfd>] au_hn_alloc+0x24d/0x370 [aufs]
> >  [<c10431a0>] ? wake_up_bit+0x60/0x60
> >  [<f80bbb8c>] au_hn_free+0x1c/0x40 [aufs]
> >  [<f80b36cb>] au_hiput+0xb/0x20 [aufs]
> >  [<f80b380b>] au_iinfo_fin+0x12b/0x1a0 [aufs]
> >  [<f80a13ab>] au_si_free+0xabb/0xc00 [aufs]
> >  [<c10ac04c>] destroy_inode+0x2c/0x50
> >  [<c10ac143>] evict+0xd3/0x150
> >  [<c10ac27d>] iput+0xbd/0x1a0
> >  [<c10aa1ef>] d_kill+0x9f/0xf0
> >  [<c10aa3f0>] shrink_dentry_list+0x1b0/0x1d0
> >  [<c10aa8ae>] shrink_dcache_sb+0x5e/0x90
> >  [<c1098b25>] do_remount_sb+0x35/0x160
> >  [<c1034110>] ? ns_capable+0x20/0x50
> >  [<c10b1870>] do_mount+0x500/0x6e0
> >  [<c101b840>] ? mm_fault_error+0x130/0x130
> >  [<c10b0208>] ? copy_mount_options+0x98/0x110
> >  [<c10b1ab6>] sys_mount+0x66/0xa0
> >  [<c134d065>] syscall_call+0x7/0xb
> 
> This call trace may be unreliable too.
> You can see destroy_inode() calling au_si_free() as well as
> au_hn_free()
> calling au_hn_alloc(), but there is no such calls in the source files.
> Look at the VFS function destroy_inode() in linux/fs/inode.c, and you
> can find what I mean.
> 
> > You say, that au_hn_alloc() cannot follow au_hn_free(). But how can
> it
> > be, that can reproduce it again and again? Which code is executing
> the
> > function "wake_up_bit()"?
> 
> It is prefixed by '?' in the call trace which means that the address
is
> unreliable. In other words, if the function name is not prefixed by
> '?',
> it is reliable.

Ok, I understand.

> > Is there everything else I can try?
> 
> - MagicSysRq + A
>   or
> - set the aufs module parameter 'debug' to 1, just before the hang
> mount
> 
> But I am afraid they may not help, since your call trace looks
> unbelievalbe to me.
> 
> Another option is modifying aufs-util/mount.aufs.c.
> Arount the line 223, you will see
>               flags[AuFlush] = test_flush(opts);
>               if (flags[AuFlush] /* && !flags[Fake] */) {
>                       err = au_plink(cwd, AuPlink_FLUSH,
>                                      AuPlinkFlag_OPEN |
AuPlinkFlag_CLOEXEC,
>                                      &fd);
>                       if (err)
>                               AuFin(NULL);
>               }
> 
> In your case, the function au_plink() is not called. But calling it
may
> break the current situation. So I'd suggest you to set flags[AuFlush]
> to
> 1 regardless the return value from test_flush().

I will try again, with this change and your patch from the other post.

I have some news:
There is a cronjob, which causes the problem. The cronjob does a "rsync"
in *dry-run* Mode between / and a branch (/mnt/overlay). I am able to
cause it as well with a simple "find /usr -type f".
Most of the time, if the command hangs, it does it only for a short
time, some seconds or a few minutes.

I will do some further research.
Thanks for your help.

Best regards
Elmar

Attachment: smime.p7s
Description: S/MIME cryptographic signature

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

Reply via email to