On Sat, Jun 16, 2018 at 04:46:21PM +0100, John Keeping wrote:
> On Wed, Jun 13, 2018 at 03:02:42PM -0400, Konstantin Ryabitsev wrote:
> > 2. I have witnessed cache corruption due to collisions (which is
> > a bug in itself). One of our frontends was hit by a lot of agressive
> > crawling of snapshots that raised the load to 60+ (many, many gzip
> > processes). After we blackholed the bot, some of the cache objects for
> > non-snapshot URLs had trailing gzip junk in them, meaning that either
> > two instances were writing to the same file, or something else resulted
> > in cache corruption. This is probably a race condition somewhere in the
> > locking code.
> 
> I've had a look at this, and I think we might end up dropping our lock
> too early thanks to this code (in fill_slot()):
> 
>       /* Restore stdout */
>       if (dup2(tmp, STDOUT_FILENO) == -1) {
> 
> Before this line, STDOUT_FILENO refers to lock_fd which has a POSIX
> advisory record lock on the entire file.  However, the documentation for
> that says:
> 
>        *  If a process closes any file descriptor referring to a file, then 
> all
>           of the process's locks on that file are released, regardless of the
>           file descriptor(s)  on  which  the  locks  were  obtained.  This is
>           bad: it means that a process can lose its locks on a file such as
>           /etc/passwd or  /etc/mtab  when for some reason a library function
>           decides to open, read, and close the same file.
> 
> I haven't verified this, but I suspect that dup'ing the original stdout
> over STDOUT_FILENO is equivalent to closing a file descriptor referring
> to our lock file.  And thus the lock is released at this point, which is
> before we rename the lock file over the cache file.
> 
> If that is correct, then there is a window during which a new process
> can open the lock file to write new content and successfully acquire the
> lock on that file even though it is still being used by another process.

I confirmed this behaviour with trace-cmd.

Before:
            cgit-7291  : posix_lock_inode:     fl=0x0xffff88020faa6258 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 
fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 
fl_end=9223372036854775807 ret=0
            cgit-7291  : fcntl_setlk:          fl=0x0xffff88020faa6258 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 
fl_pid=7291 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 
fl_end=9223372036854775807 ret=0
            cgit-7291  : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 
type=F_UNLCK ctx=0xffff8801beeb7930
            cgit-7291  : posix_lock_inode:     fl=0x0xffffc90003627da8 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 
fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 
fl_end=9223372036854775807 ret=0
            cgit-7291  : locks_remove_posix:   fl=0x0xffffc90003627da8 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff88007a8f2680 
fl_pid=7291 fl_flags=FL_POSIX|FL_CLOSE fl_type=F_UNLCK fl_start=0 
fl_end=9223372036854775807 ret=0
            cgit-7291  : sys_enter_rename:     oldname: 0x559bd0bd5830, 
newname: 0x559bd0bd57d0
            cgit-7291  : sys_exit_rename:      0x0

After:
            cgit-7488  : posix_lock_inode:     fl=0x0xffff8802122c43e8 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 
fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 
fl_end=9223372036854775807 ret=0
            cgit-7488  : fcntl_setlk:          fl=0x0xffff8802122c43e8 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x(nil) fl_owner=0x0xffff8800310958c0 
fl_pid=7488 fl_flags=FL_POSIX fl_type=F_WRLCK fl_start=0 
fl_end=9223372036854775807 ret=0
            cgit-7488  : sys_enter_rename:     oldname: 0x56512cd7f830, 
newname: 0x56512cd7f7d0
            cgit-7488  : sys_exit_rename:      0x0
            cgit-7488  : locks_get_lock_context: dev=0x8:0x14 ino=0x4e3e13 
type=F_UNLCK ctx=0xffff8802144daa10
            cgit-7488  : posix_lock_inode:     fl=0x0xffffc900038d7da8 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 
fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE 
fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0
            cgit-7488  : locks_remove_posix:   fl=0x0xffffc900038d7da8 
dev=0x8:0x14 ino=0x4e3e13 fl_next=0x0xffff880006f5b780 
fl_owner=0x0xffff8800310958c0 fl_pid=7488 fl_flags=FL_POSIX|FL_CLOSE 
fl_type=F_UNLCK fl_start=0 fl_end=9223372036854775807 ret=0


I'm planning to queue the patch below on jk/for-jason and send a PR in
the next day or two, but it would be nice to get a reviewed-by before I
do that.

> -- >8 --
> Subject: [PATCH] cache: close race window when unlocking slots
> 
> We use POSIX advisory record locks to control access to cache slots, but
> these have an unhelpful behaviour in that they are released when any
> file descriptor referencing the file is closed by this process.
> 
> Mostly this is okay, since we know we won't be opening the lock file
> anywhere else, but there is one place that it does matter: when we
> restore stdout we dup2() over a file descriptor referring to the file,
> thus closing that descriptor.
> 
> Since we restore stdout before unlocking the slot, this creates a window
> during which the slot content can be overwritten.  The fix is reasonably
> straightforward: simply restore stdout after unlocking the slot, but the
> diff is a bit bigger because this requires us to move the temporary
> stdout FD into struct cache_slot.
> 
> Signed-off-by: John Keeping <[email protected]>
> ---
>  cache.c | 37 ++++++++++++++-----------------------
>  1 file changed, 14 insertions(+), 23 deletions(-)
> 
> diff --git a/cache.c b/cache.c
> index 0901e6e..2c70be7 100644
> --- a/cache.c
> +++ b/cache.c
> @@ -29,6 +29,7 @@ struct cache_slot {
>       cache_fill_fn fn;
>       int cache_fd;
>       int lock_fd;
> +     int stdout_fd;
>       const char *cache_name;
>       const char *lock_name;
>       int match;
> @@ -197,6 +198,13 @@ static int unlock_slot(struct cache_slot *slot, int 
> replace_old_slot)
>       else
>               err = unlink(slot->lock_name);
>  
> +     /* Restore stdout and close the temporary FD. */
> +     if (slot->stdout_fd >= 0) {
> +             dup2(slot->stdout_fd, STDOUT_FILENO);
> +             close(slot->stdout_fd);
> +             slot->stdout_fd = -1;
> +     }
> +
>       if (err)
>               return errno;
>  
> @@ -208,42 +216,24 @@ static int unlock_slot(struct cache_slot *slot, int 
> replace_old_slot)
>   */
>  static int fill_slot(struct cache_slot *slot)
>  {
> -     int tmp;
> -
>       /* Preserve stdout */
> -     tmp = dup(STDOUT_FILENO);
> -     if (tmp == -1)
> +     slot->stdout_fd = dup(STDOUT_FILENO);
> +     if (slot->stdout_fd == -1)
>               return errno;
>  
>       /* Redirect stdout to lockfile */
> -     if (dup2(slot->lock_fd, STDOUT_FILENO) == -1) {
> -             close(tmp);
> +     if (dup2(slot->lock_fd, STDOUT_FILENO) == -1)
>               return errno;
> -     }
>  
>       /* Generate cache content */
>       slot->fn();
>  
>       /* Make sure any buffered data is flushed to the file */
> -     if (fflush(stdout)) {
> -             close(tmp);
> +     if (fflush(stdout))
>               return errno;
> -     }
>  
>       /* update stat info */
> -     if (fstat(slot->lock_fd, &slot->cache_st)) {
> -             close(tmp);
> -             return errno;
> -     }
> -
> -     /* Restore stdout */
> -     if (dup2(tmp, STDOUT_FILENO) == -1) {
> -             close(tmp);
> -             return errno;
> -     }
> -
> -     /* Close the temporary filedescriptor */
> -     if (close(tmp))
> +     if (fstat(slot->lock_fd, &slot->cache_st))
>               return errno;
>  
>       return 0;
> @@ -393,6 +383,7 @@ int cache_process(int size, const char *path, const char 
> *key, int ttl,
>       strbuf_addstr(&lockname, ".lock");
>       slot.fn = fn;
>       slot.ttl = ttl;
> +     slot.stdout_fd = -1;
>       slot.cache_name = filename.buf;
>       slot.lock_name = lockname.buf;
>       slot.key = key;
> -- 
> 2.17.1
_______________________________________________
CGit mailing list
[email protected]
https://lists.zx2c4.com/mailman/listinfo/cgit

Reply via email to