Re: [Libhugetlbfs-devel] Improved exorcism

Nishanth Aravamudan Mon, 30 Oct 2006 13:45:27 -0800

On 30.10.2006 [15:56:43 +1100], David Gibson wrote:
> Latest daemon removal path.  Updated to apply clean on top of the
> latest changes in the git tree, otherwise nothing new.
> 
> Now that 1.0.1 is out, are we ready to merge this?  Remember that we
> can make a git branch to hold strictly-maintenance revisions of
> 1.0/1.0.1 for Red Hat, if necessary.


<snip>

Mostly nits below...

> +Now, the same application run twice with the same environment variable
> +settings will use the same file both times (reducing the overall
> +hugetlbfs file usage). If, for any reason, segment sharing is
> +requested but unavailable, the library will fallback to unlinked
> +files.

Since there is no longer automatic removal of files in the hugetlbfs
mount-point after some time (as there was with hugetlbd), this should be
documented. And perhaps the bash script, or whatever you would like to
have as a control mechanism should also be merged with the patch in its
final form.

> Index: libhugetlbfs/elflink.c
> ===================================================================
> --- libhugetlbfs.orig/elflink.c       2006-10-30 13:18:23.000000000 +1100
> +++ libhugetlbfs/elflink.c    2006-10-30 13:26:05.000000000 +1100
> @@ -1,6 +1,6 @@
>  /*
>   * libhugetlbfs - Easy use of Linux hugepages
> - * Copyright (C) 2005-2006 David Gibson & Adam Litke, IBM Corporation.
> + * Copyright (C) 2005-2006 David Gibson & dam Litke, IBM Corporation.

Mistake?

>   *
>   * This library is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU Lesser General Public License
> @@ -24,17 +24,20 @@
>  #include <stdlib.h>
>  #include <malloc.h>
>  #include <string.h>
> +#include <unistd.h>
> +#include <fcntl.h>
>  #include <signal.h>
>  #include <sys/syscall.h>
> +#include <sys/file.h>
>  #include <linux/unistd.h>
>  #include <sys/mman.h>
>  #include <errno.h>
> +#include <limits.h>
>  #include <elf.h>
>  #include <dlfcn.h>
>  
>  #include "hugetlbfs.h"
>  #include "libhugetlbfs_internal.h"
> -

Unnecessary whitespace change.

> +/**
> + * assemble_path - handy wrapper around snprintf() for building paths
> + * @dst: buffer of size PATH_MAX+1 to assemble string into
> + * @...: strings to concatenate into the path, terminated with a NULL
> + */
> +static void assemble_path(char *dst, const char *fmt, ...)
> +{
> +     va_list ap;
> +     int len;
> +
> +     va_start(ap, fmt);
> +     len = vsnprintf(dst, PATH_MAX+1, fmt, ap);

Shouldn't you also check the return value of vsnprintf()? And that makes
assemble_path() need to be a non-void...

> +     va_end(ap);
> +
> +     if (len > PATH_MAX) {
> +             ERROR("Overflow assembling path");
> +             abort();

So you can just return -1 here or something... Why are you aborting by
the way? Shouldn't it just fail to share? [1]

> +static int check_env(void)
> +{

<snip>

> +}

This whole function move & change should be submitted as a separate
patch, it has little to do with removing the daemon and makes this diff
larger than necessary. But it totally makes sense to merge :)

>  /*
>   * Parse an ELF header and record segment information for any segments
>   * which contain hugetlb information.
>   */
> -

Unrelated change.

> +static int find_or_create_share_path(void)

Deserves proper commenting at the head of the function, please.

> +{
> +     char *env;
> +     struct stat sb;
> +     int ret;
> +
> +     env = getenv("HUGETLB_SHARE_PATH");
> +     if (env) {
> +             /* Given an explicit path */
> +             if (hugetlbfs_test_path(env) != 0) {
> +                     ERROR("HUGETLB_SHARE_PATH %s is not on a hugetlbfs\n",

perhaps "mount point" or "filesystem" (even if it's redundant) at the
end of the message?

> +                           share_path);
> +                     return -1;
> +             }
> +             assemble_path(share_path, "%s", env);
> +             return 0;
> +     }
> +
> +     assemble_path(share_path, "%s/elflink-uid-%d",
> +                   hugetlbfs_find_path(), getuid());

[1] ... and check the return values here and abort then. Or just return
-1 a level higher.

> +     ret = mkdir(share_path, 0700);
> +     if ((ret != 0) && (errno != EEXIST)) {
> +             ERROR("Error creating share directory %s", share_path);
> +             return -1;
> +     }
> +
> +     /* Check the share directory is sane */
> +     ret = lstat(share_path, &sb);
> +     if (ret != 0) {
> +             ERROR("Couldn't stat() %s: %s\n", share_path, strerror(errno));
> +             return -1;
> +     }
> +
> +     if (! S_ISDIR(sb.st_mode)) {

I think we've been avoiding the extra space after the '!'.

Does this check make sense given that you're using lstat() above? If the
HUGETLB_SHARE_PATH is a symbolic link, then lstat() is going to stat the
link, not the dir it points to, and then we'll fail here...

> +             ERROR("%s is not a directory", share_path);
> +             return -1;
> +     }
> +
> +     if (sb.st_uid != getuid()) {
> +             ERROR("%s has wrong owner (uid=%d instead of %d)\n",
> +                   share_path, sb.st_uid, getuid());
> +             return -1;
> +     }

To be consistent, probably should have a newline here too.

> +     if (sb.st_mode & (S_IWGRP | S_IWOTH)) {
> +             ERROR("%s has bad permissions 0%03o\n",
> +                   share_path, sb.st_mode);
> +             return -1;
> +     }

But wait, this doesn't make complete sense -- you're only doing all this
sanity checking if we mkdir(). Shouldn't we also do it in the case of a
custom path?

And you're not backing out changes to the system correctly. After the
mkdir() has succeeded, if something else fails, we should rmdir() it.

>  /* 
>   * Look for non-zero BSS data inside a range and print out any matches
>   */
> -

Unrelated change.

>  static void check_bss(unsigned long *start, unsigned long *end)
>  {
>       unsigned long *addr;
> @@ -249,13 +383,52 @@ static void check_bss(unsigned long *sta
>       }
>  }
>  
> +/**
> + * get_shared_file_name - create a shared file name from program name, gid 
> and phdr

The gid is no longer being used, so the comments need to be updated. And
mention of the word-size should be added.

> + * @htlb_seg_info: pointer to program's segment data
> + * @file_path: pointer to a PATH_MAX array to store filename in
> + *
> + * The file name created is *not* intended to be unique, except when the 
> name,
> + * gid or phdr number differ. The goal here is to have a standar means of

Typo from my version, standar -> standard.

<snip>

>  /* 
>   * Subtle:  Since libhugetlbfs depends on glibc, we allow it
>   * it to be loaded before us.  As part of its init functions, it
>   * initializes stdin, stdout, and stderr in the bss.  We need to
>   * include these initialized variables in our copy.
>   */
> -

Unrelated whitespace change.

>  static void get_extracopy(struct seg_info *seg, void **extra_start, 
>                                                       void **extra_end)
>  {
> @@ -387,7 +560,6 @@ bail3:
>   * smallest amount of data possible, unless the user disables this 
>   * optimization via the HUGETLB_ELFMAP environment variable.
>   */
> -

Ditto.

<snip>

> +/**
> + * find_or_prepare_shared_file - get one shareable file
> + * @htlb_seg_info: pointer to program's segment data

This seems rather insufficient for a description of the core logic.

> + * returns:
> + *   -1, on failure
> + *   0, on success
> + */
> +static int find_or_prepare_shared_file(struct seg_info *htlb_seg_info)
> +{
> +     int fdx, fds;
> +     int errnox, errnos;
> +     int ret;
> +     char final_path[PATH_MAX+1];
> +     char tmp_path[PATH_MAX+1];
> +
> +     ret = get_shared_file_name(htlb_seg_info, final_path);
> +     if (ret < 0)
> +             return -1;
> +     assemble_path(tmp_path, "%s.tmp", final_path);
> +
> +     do {

This really needs more commenting, I think. Maybe a little paragraph at
the top like:

We open two files to guarantee exclusive access to only one process. One
file, fdx, is exclusively created. fds is open'd read-only. If fds
successfully opened, someone else created the file we want to share. If
fdx also opened in this case, we are the only sharer, but it doesn't
really matter, since we're not going to try to write to the file
anyways. If fdx failed to open because someone already created the file,
though, we also succeed, since the file we want to share already exists.
Finally, if we were unable to open the file-to-share because it didn't
exist, but we got exclusive access to fdx, then we need to prepare the
segment ourselves. We do the prepare and then rename the file, so that
subsequent preparers will get valid fds's. If both opens failed, someone
else must be preparing the file right now, so we can just wait and see.

Ok, not so little, but I was being verbose. In any case, a description
of the algorithm is needed.

> +             /* NB: mode is modified by umask */
> +             fdx = open(tmp_path, O_CREAT | O_EXCL | O_RDWR, 0666);
> +             errnox = errno;
> +             fds = open(final_path, O_RDONLY);
> +             errnos = errno;
> +
> +             if (fds > 0) {

Stupid nit, but >= technically, right?

<snip>

> +             /* Both opens failed, somebody else is still preparing */
> +             /* Wait and try again */
> +             sleep(1);
> +             /* FIXME: should have a timeout */

Should be easy enough to add, no?

<snip>

Otherwise, looks sane to me...

What kind of testing has been done with the patch?

Thanks,
Nish

-- 
Nishanth Aravamudan <[EMAIL PROTECTED]>
IBM Linux Technology Center

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Re: [Libhugetlbfs-devel] Improved exorcism

Reply via email to