Hi,

"Vonlanthen, Elmar":
> In the attached logfile you will find the aufs debug information.

Thank you very much for testing.


> As you can see, the remount command at 14:49:57 hangs for approximately
> 30 seconds.
> ("find" was running each time ~1s)
>
> I'm not sure if the log-output (taken from syslog-ng) is reliable,
> because there are some strange messages and some error messages from
> syslog-ng itself. And in "dmesg" we don't have timestamps.

I think the log is reliable basically, and the lacking of timestamp is
not a problem becuase the aufs debug messages print it.
The line
        aufs au_dbg_cond:74:mount[11763]: DEBUG: 0,  1798.258628
means "cpu id", "elapsed time since boot"

We can see
        aufs au_dbg_cond:74:mount[11774]: DEBUG: 1,  1799.554563
at the first 14:49:57 log.
Also by grep + cut + uniq -c the log, I got
        :::
     32   1798
     41   1799
      1   1800
      1  nd[11775]: DEBUG: 8    <-- this line is simply broken due to syslog-ng.
     28   1800
     90   1801
     90   1802
        :::
which means that aufs processed 32 inodes in one second, 1798 - 1799.
And in the next one second (1799 - 1800) aufs processed 41 inodes.
These numbers of the processed inodes are relative low, which means aufs
works slow. During 1739 - 1796, aufs processed about 50 inodes per second.
And in the next one second (1800 - 1801) aufs processed about 30 inodes again.
Interestingly, after 1801, aufs processed about 90 inodes per second.

>From your log, the maximum aufs performance on your system looks about
90 inodes per second, and when the system becomes busy it gets about 3
times lower (or worse).


> If the mount command is hanging I see this messages:
        :::
> 2012-01-26 14:50:15 chgut12fw01 kernel: aufs
> au_dbg_cond:74:mount[11774]: DEBUG: 1,  1817.347161

But aufs and the mount process did not seems to be hang totally. It was
still working and just became slow, wasn't it? (I know "too slow" is
nearly equal to "hang")


> Could it be a problem with the kernel thread "fsnotify_mark"?

Maybe.
But, *currently* I don't think it is a problem of fsnotify.
We saw the fsnotify_mark kthread stopped at synchronize_srcu() in
fsnotify_mark_destroy(), and it was waiting for the completion of
another task. Currently, I trust fsnotify and srcu, and don't find any
problem (which may cause your problem) in them.

Assuming your "mount" process did NOT hang actually and it just happen
a simple slow-down, I think it is caused by VFS and aufs, just my
current guess.
- Generally VFS caches dentry (filename) and inode when it is accessed.
- When remounting, VFS tries discarding all unused (but cached)
  dentryies and inodes.
When you run "find" and "remount" at the same time, these caching and
uncaching operations run concurrently. By "find", the dentries are cached
more and more. By remount, VFS tries discarding them, all of them. But
it won't end easily because the "find" producer is still running. Of
course, the "find" producer will end eventualy, but it may take long
time.

In your scenario, you have three processes (threads) for two CPUs (right?)
- mount
  VFS tries discarding the all cached-unused inode, which makes aufs and
  fsnotiy to free the "notify" trick by passing the data to the
  fsnotify_mark kthread.
- find
  it digs the directory tree, and aufs sets the "notify" trick to them.
- fsnotify_mark kthread
  it waits for the data to free.

The context-switch among them might become very heavy. This is my
current guess.

On the other hand, there is a method (d_delete) to stop VFS caching the
unused dentries. Some filesystems implement it, but others don't. Aufs
doesn't implement it either.
If aufs implemented the method, the number of unused-but-cached dentries
would be 0 and the VFS remount operation will not wait.
Let's try this method.
Here is a patch for it. It still may take long time, but will become
much shorter I hope.


J. R. Okajima

diff --git a/fs/aufs/dentry.c b/fs/aufs/dentry.c
index 11ed8c2..f6f94fb 100644
--- a/fs/aufs/dentry.c
+++ b/fs/aufs/dentry.c
@@ -1134,7 +1134,13 @@ static void aufs_d_release(struct dentry *dentry)
 	}
 }
 
+static int aufs_d_delete(const struct dentry *dentry)
+{
+	return 1;
+}
+
 const struct dentry_operations aufs_dop = {
 	.d_revalidate	= aufs_d_revalidate,
-	.d_release	= aufs_d_release
+	.d_release	= aufs_d_release,
+	.d_delete	= aufs_d_delete
 };
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

Reply via email to