Re: [PATCH] inotify to minimize stat() calls

Thomas Rast Thu, 25 Apr 2013 01:19:01 -0700

Robert Zeh <robert.allan....@gmail.com> writes:

> Here is a patch that creates a daemon that tracks file
> state with inotify, writes it out to a file upon request,
> and changes most of the calls to stat to use said cache.
>
> It has bugs, but I figured it would be smarter to see
> if the approach was acceptable at all before spending the
> time to root the bugs out.


Thanks for tackling this; it's probably about time we got a inotify
support :-(

> I've implemented the communication with a file, and not a socket,
> because I think implementing a socket is going to create
> security issues on multiuser systems.  For example, would a
> socket allow stat information to cross user boundaries?

This ties in with an issue discussed in an earlier thread:

  http://thread.gmane.org/gmane.comp.version-control.git/217817/focus=218307

The conclusion there was that the default limits are set such that it is
not feasible to run one daemon per repository (that would quickly hit
the limits when e.g. iterating all repos in a typical android tree using
repo).

So whatever you use for communication needs to work as a global daemon.

I'd just trust the SSH folks to know about security; on my system
ssh-agent creates

  /tmp/ssh-RANDOMSTRING/agent.PID

where the directory has mode 0700, and the file is a unit socket with
mode 0600.  That should make doubly sure that no other user can open the
socket.

>  filechange-cache.c   | 203
> +++++++++++++++++++++++++++++++++++++++++++++++++++

Is your MUA wrapping the patch?

> +static void watch_directory(int inotify_fd)
> +{
> +     char buf[PATH_MAX];
> +
> +     if (!getcwd(buf, sizeof(buf)))
> +             die_errno("Unable to get current directory");
> +
> +     int i = 0;
> +     struct dir_struct dir;
> +     const char *pathspec[1] = { buf, NULL };
> +
> +     memset(&dir, 0, sizeof(dir));
> +     setup_standard_excludes(&dir);
> +
> +     fill_directory(&dir, pathspec);
> +     for(i = 0; i < dir.nr; i++) {
> +             struct dir_entry *ent = dir.entries[i];
> +             watch_file(inotify_fd, ent->name);
> +             free(ent);
> +     }

I don't get this bit.  The lstat() are run over all files listed in the
index.  So shouldn't your daemon watch exactly those (or rather, all
dirnames of such files)?

The actual directory contents are only needed to find untracked files,
and there would be a lot of complication surrounding that, so I suggest
saving that for later (and for now measuring the speedup with 'git
status -uno'!).

For example, you'd have to actually watch and re-read all .gitignore
files, and the .git/info/exclude, and the core.excludesfile, to see if
your notion of an ignored file became stale.

Also, you seem to call watch_directory() only on the current(?) dir, but
you need to recursively set up watches for all directories in the
repository.

> +     while (1) {
> +             int i = 0;
> +             length = read(inotify_fd, buffer, sizeof(buffer));
> +             for(i = 0; i < length; ) {
> +                     struct inotify_event *event =
> +                             (struct inotify_event*)(buffer+i);
> +                     /* printf("event: %d %x %d %s\n", event->wd, 
> event->mask,
> +                        event->len, event->name); */
> +                     if (request_watch_descriptor == event->wd) {
> +                             write_stat_cache();
> +                     } else if (root_directory_watch_descriptor
> +                                == event->wd) {
> +                             printf("root directory died!\n");
> +                             exit(0);
> +                     } else if (event->mask & IN_Q_OVERFLOW) {
> +                             restart();

Good.

> +                     } else if (event->mask & IN_MODIFY) {
> +                             if (event->len)
> +                                     update_stat_cache(event->name);
> +                     }

So whenever a file changes, you stat() it.  That's good for simplicity
now, but I suspect it will provide some optimization opportunities
later.


On some design aspects, I'd want:

* a toggle to run the test suite with the daemons, or without

* if you go with a user-wide daemon, a way to ensure that the test-suite
  daemon is not the same as my "real" daemon, and make sure it is killed
  after the test runs finish

* a test that triggers IN_Q_OVERFLOW, e.g. by sending SIGSTOP and doing
  a large repository operation

* a test that renames directories

The last one is just based on my personal experience with messing with
inotify; renaming directories is the "hard" case for that API.  We may
already cover this in the test suite, or we may not; but it must be
tested.

Other than that last point, focus your tests not on small tests but on
the test suite.  It would seem rather unlikely to me that you could
manage to pass the entire test suite with this daemon active but broken.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] inotify to minimize stat() calls

Reply via email to