Hi,

How can some library code determine the name of the running program,
for error message and display purposes, if the program's main() function
has not stored argv[0] in a particular place?

Let's be clear about two things:

  1) This question is mostly irrelevant for libposix, because library
     functions in libposix should not call exit() and should not print
     error messages. The only ways out that a library function has is
     to call abort() if there was a programming error, and to return
     an error code or error message that the caller can then handle.

  2) Even if we find an answer to this question, it does not magically
     resolve all problems with the 'error' module, because the question
     remains where the 'program_name' variable shall be allocated,
     and it is a portability problem for MacOS X, AIX, Solaris, Cygwin [1].

There was this thread [2] in 2006 where we found out that we needed some
new API [3] because we did not want to implement the BSD functions with
different semantics.

Recently, Bastien Roucariès respawned the topic in two threads [4][5],
by proposing to fetch the program invocation name (= argv[0] or a
variant of it) through the API that 'ps' and 'top' typically use.

Find attached a collection of the platform dependent code snippets.
I added code for Cygwin, mingw (from progreloc.c), and OSF/1, and
tested it on the various platforms.

The results of this investigation are:

  1) There are six different notions of "program name".

       - get_program_invocation_short_name ()
         This is what glibc calls 'program_invocation_short_name' and
         BSD calls 'getprogname ()'. It's the basename of the argv[0].

       - get_program_invocation_short_name_truncated ()
         This is like get_program_invocation_short_name (), except it
         is truncated to a certain number of bytes (14, 16, or 32).

       - get_program_invocation_name ()
         This is what glibc calls 'program_invocation_name'. It's the
         argv[0] before the search in $PATH was performed.

       - get_program_absolute_name ()
         This is like get_program_invocation_name(), except that it
         makes the result absolute by stuffing in the current directory
         if needed.

       - get_resolved_program_invocation_name ()
         This is like get_program_invocation_name(), except that it
         resolves symbolic links and removes trivial filename components.

       - get_program_canonicalized_name ()
         This is like a combination of get_program_absolute_name() and
         get_resolved_program_invocation_name(): It resolves symbolic
         links, removes trivial filename components, and stuffs in the
         current directory.

  2) Each of the functions may fail, i.e. return NULL.

  3) It's highly platform dependent. Here's the matrix of available functions:

        A  = get_program_invocation_short_name ()
        A x = get_program_invocation_short_name_truncated ()
        B = get_program_invocation_name ()
        C = get_program_absolute_name ()
        D = get_resolved_program_invocation_name ()
        E = get_program_canonicalized_name ()

                     | A | B | C | D | E |
      ---------------+---+---+---+---+---+
       glibc/Linux   | X | X |   |   | X |
       glibc/other   | X | X |   |   |   |
       uClibc/Linux  |   |   |   |   | X |
       MacOS X       | X |   | X |   |   |
       FreeBSD       | X |   |   |   |   |
       NetBSD        | X |   |   |   |   |
       OpenBSD       | X |   |   |   |   |
       AIX           | x | X |   |   |   |
       HP-UX         | x |   |   |   |   |
       IRIX          | x |   |   |   |   |
       OSF/1         | x |   |   |   |   |
       Solaris       | X |   |   | X |   |
       Cygwin        | X |   |   |   | X |
       mingw         |   |   |   |   | X |
       Interix       |   |   |   |   |   |
      ---------------+---+---+---+---+---+

     Given B, you can determine C, D, E, by assuming the current directory
     and $PATH have not changed since the program was launched.

     But given A only, one cannot determine B, C, D, E. And unfortunately,
     on many Unix platforms, A is the only thing you can get.

What can we reasonably do with this?

(a) We could create a new module that exports functions

    /* Returns the base name of argv[0], if known.  */
    const char *get_program_invocation_short_name ();

    /* Returns the truncated base name of argv[0], if known.  */
    const char *get_program_invocation_short_name_truncated ();
    size_t get_program_invocation_short_name_truncation_length ();

    /* Return argv[0], without resolving symlinks or current directory
       if possible.  */
    const char *get_program_invocation_name ();

    /* Return argv[0] as a canonical filename.  Assumes that the current
       directory and $PATH have not changed since the program was launched.  */
    const char *get_program_canonicalized_name ();

    Of course these functions would cache their respective result once it has
    been determined.

    And of course there are platforms, like Interix, NonStop Kernel, or Haiku,
    where all functions will return NULL.

(b) We can observe that these proposed functions would still have
    portability pitfalls:
      - truncation of the short name,
      - differences regarding relative filenames and symlinks,
      - differences regarding the ".exe" suffix on Windows.
    Decide that it is better to have no gnulib API at all than an API that
    has portability problems.

What do you think?

Bruno

[1] http://lists.gnu.org/archive/html/bug-recutils/2010-12/msg00010.html
[2] http://lists.gnu.org/archive/html/bug-gnulib/2006-01/msg00005.html
[3] http://lists.gnu.org/archive/html/bug-gnulib/2006-01/msg00122.html
[4] http://lists.gnu.org/archive/html/bug-gnulib/2010-11/msg00088.html
[5] http://lists.gnu.org/archive/html/bug-gnulib/2010-12/msg00064.html
/* Guessing the program name.  */

/* 3 ways to call a program:
   $ foo                            - search in PATH
   $ ./foo                          - relative filename
   $ /bin/foo                       - absolute filename
 */

/* See also these discussions:
   http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/5080/focus=5082
   http://fixunix.com/unix/542990-getting-mains-argv-0-a.html
 */

/* ============================= glibc systems ============================= */
#if !1

/* Reference: glibc manual
   http://www.gnu.org/software/libc/manual/html_node/Error-Messages.html
 */

#define _GNU_SOURCE 1
#include <errno.h>

/* The value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "./foo"
   $ /bin/foo                       -> "/bin/foo"
 */
static const char *
get_program_invocation_name ()
{
  return program_invocation_name;
}

/* The part after the last slash (if any) of the value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static const char *
get_program_invocation_short_name ()
{
  return program_invocation_short_name;
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_invocation_name() = %s\n", get_program_invocation_name ());
  printf ("get_program_invocation_short_name() = %s\n", get_program_invocation_short_name ());
  return 0;
}

#endif

/* ============================ uClibc systems ============================ */

/* Depending on the configuration, libc may or may not contain the variables
   'program_invocation_name' and 'program_invocation_short_name', like glibc.
 */

/* ============================= Linux systems ============================= */
#if !1

/* The executable is accessible as /proc/<pid>/exe.  In newer Linux
   versions, also as /proc/self/exe.  Linux >= 2.1 provides a symlink
   to the true pathname; older Linux versions give only device and ino,
   enclosed in brackets, which we cannot use here.  */

#define _GNU_SOURCE 1
#include <limits.h>
#include <string.h>
#include <unistd.h>

/* The canonicalized file name of the program.
   $ foo                            -> "/opt/gnu/bin/foo"
   $ ./foo                          -> "/opt/gnu/bin/foo"
   $ /bin/foo                       -> "/opt/gnu/bin/foo"
 */
static char *
get_program_canonicalized_name ()
{
  char buf[PATH_MAX];
  int ret;

  ret = readlink ("/proc/self/exe", buf, sizeof (buf));
  if (ret < 0 || buf[0] == '[')
    return NULL;
  return strndup (buf, PATH_MAX);
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_canonicalized_name() = %s\n", get_program_canonicalized_name ());
  return 0;
}

#endif

/* ============================ MacOS X systems ============================ */
#if !1

#include <mach-o/dyld.h>
#include <string.h>

/* On MacOS X 10.2 or newer, the function
     int _NSGetExecutablePath (char *buf, uint32_t *bufsize);
   can be used to retrieve the executable's full path.
   Reference:
   <http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man3/dyld.3.html>
 */

/* The absolute name (but not canonicalized) of the value of argv[0].
   $ foo                            -> /found_dir/foo
   $ ./foo                          -> /currdir/./foo
   $ symlink/foo                    -> /currdir/symlink/foo
   $ /bin/foo                       -> /bin/foo
 */
static char *
get_program_absolute_name ()
{
  char location[4096];
  unsigned int length = sizeof (location);
  if (_NSGetExecutablePath (location, &length) == 0)
    return strdup (location);
  else
    return NULL;
}

#include <stdlib.h>

/* The part after the last slash (if any) of the value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static const char *
get_program_invocation_short_name ()
{
  return getprogname ();
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_absolute_name() = %s\n", get_program_absolute_name ());
  printf ("get_program_invocation_short_name() = %s\n", get_program_invocation_short_name ());
  return 0;
}

#endif

/* ============================ FreeBSD systems ============================ */
#if !1

/* Reference:
   <http://www.freebsd.org/cgi/man.cgi?query=getprogname&sektion=3>
 */

#include <stdlib.h>

/* The part after the last slash (if any) of the value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static const char *
get_program_invocation_short_name ()
{
  return getprogname ();
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_invocation_short_name() = %s\n", get_program_invocation_short_name ());
  return 0;
}

#endif

/* ============================ NetBSD systems ============================ */
#if !1

#include <stdlib.h>

/* The part after the last slash (if any) of the value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static const char *
get_program_invocation_short_name ()
{
  return getprogname ();
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_invocation_short_name() = %s\n", get_program_invocation_short_name ());
  return 0;
}

#endif

/* ============================ OpenBSD systems ============================ */
#if !1

/* The variable __progname is defined in /usr/lib/crt0.o.  */

/* The part after the last slash (if any) of the value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static const char *
get_program_invocation_short_name ()
{
  extern const char *__progname;
  return __progname;
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_invocation_short_name() = %s\n", get_program_invocation_short_name ());
  return 0;
}

#endif

/* ============================== AIX systems ============================== */
#if !1

/* Idea by Bastien ROUCARIÈS <[email protected]>.  */
/* <http://lists.gnu.org/archive/html/bug-gnulib/2010-12/msg00095.html> */

/* Reference:
   <http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.basetechref/doc/basetrf1/getprocs.htm>
 */

#include <unistd.h>
#include <procinfo.h>
#include <string.h>

/* The part after the last slash (if any) of the value of argv[0],
   truncated to at most 32 bytes.
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static char *
get_program_invocation_short_name_truncated ()
{
  extern int getprocs64 (struct procentry64 *, int, struct fdsinfo64 *, int,
                         pid_t *, int);
  pid_t pid = getpid ();
  struct procentry64 procs;
  if (getprocs64 (&procs, sizeof (procs), NULL, 0, &pid, 1) > 0)
    return strdup (procs.pi_comm);
  return NULL;
}

/* The value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "./foo"
   $ /bin/foo                       -> "/bin/foo"
 */
static char *
get_program_invocation_name ()
{
  extern int getargs (void *, int, char *, int);
  char arg0[PATH_MAX + 1];
  struct procentry64 procs;
  procs.pi_pid = getpid ();
  if (getargs (&procs, sizeof (procs), arg0, sizeof (arg0)) == 0)
    /* arg0 is always NUL terminated.  */
    return strdup (arg0);
  return NULL;
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_invocation_short_name_truncated() = %s\n", get_program_invocation_short_name_truncated ());
  printf ("get_program_invocation_name() = %s\n", get_program_invocation_name ());
  return 0;
}

#endif

/* ============================= HP-UX systems ============================= */
#if !1

/* Idea by Bastien ROUCARIÈS <[email protected]>.  */
/* <http://lists.gnu.org/archive/html/bug-gnulib/2010-11/msg00203.html> */

/* Reference: <http://docs.hp.com/en/B2355-90682/pstat.2.html> */

#include <unistd.h>
#include <sys/pstat.h>
#include <string.h>

/* The part after the last slash (if any) of the value of argv[0],
   truncated to at most 14 bytes.
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static char *
get_program_invocation_short_name_truncated ()
{
  int pid = getpid ();
  struct pst_status buf;

  if (pstat_getproc (&buf, sizeof (buf), 0, pid) > 0)
    return strdup (buf.pst_ucomm);
  return NULL;
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_invocation_short_name_truncated() = %s\n", get_program_invocation_short_name_truncated ());
  return 0;
}

#endif

/* ============================= IRIX systems ============================= */
#if !1

/* Idea by Bastien ROUCARIÈS <[email protected]>.  */
/* <http://lists.gnu.org/archive/html/bug-gnulib/2010-12/msg00096.html> */

#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/procfs.h>

static size_t
strnlen (const char *string, size_t maxlen)
{
  const char *end = memchr (string, '\0', maxlen);
  return end ? (size_t) (end - string) : maxlen;
}

static char *
strndup (char const *s, size_t n)
{
  size_t len = strnlen (s, n);
  char *new = malloc (len + 1);

  if (new == NULL)
    return NULL;

  new[len] = '\0';
  return memcpy (new, s, len);
}

/* The part after the last slash (if any) of the value of argv[0],
   truncated to at most 32 bytes.
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static char *
get_program_invocation_short_name_truncated ()
{
  char filename[50];
  int fd;

  sprintf (filename, "/proc/pinfo/%d", (int) getpid ());
  fd = open (filename, O_RDONLY);
  if (fd >= 0)
    {
      prpsinfo_t buf;

      if (ioctl (fd, PIOCPSINFO, &buf) >= 0)
        {
          close (fd);
          return strndup (buf.pr_fname, sizeof (buf.pr_fname));
        }
      close (fd);
    }
  return NULL;
}

int
main ()
{
  printf ("get_program_invocation_short_name_truncated() = %s\n", get_program_invocation_short_name_truncated ());
  return 0;
}

#endif

/* ============================= OSF/1 systems ============================= */
#if !1

#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/procfs.h>

static size_t
strnlen (const char *string, size_t maxlen)
{
  const char *end = memchr (string, '\0', maxlen);
  return end ? (size_t) (end - string) : maxlen;
}

static char *
strndup (char const *s, size_t n)
{
  size_t len = strnlen (s, n);
  char *new = malloc (len + 1);

  if (new == NULL)
    return NULL;

  new[len] = '\0';
  return memcpy (new, s, len);
}

/* The part after the last slash (if any) of the value of argv[0],
   truncated to at most 16 bytes.
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static char *
get_program_invocation_short_name_truncated ()
{
  char filename[50];
  int fd;

  sprintf (filename, "/proc/%d", (int) getpid ());
  fd = open (filename, O_RDONLY);
  if (fd >= 0)
    {
      prpsinfo_t buf;

      if (ioctl (fd, PIOCPSINFO, &buf) >= 0)
        {
          close (fd);
          return strndup (buf.pr_fname, sizeof (buf.pr_fname));
        }
      close (fd);
    }
  return NULL;
}

int
main ()
{
  printf ("get_program_invocation_short_name_truncated() = %s\n", get_program_invocation_short_name_truncated ());
  return 0;
}

#endif

/* ============================ Solaris systems ============================ */
#if !1

/* Reference for getexecname:
   <http://docs.sun.com/app/docs/doc/816-5168/getexecname-3c?l=en&a=view>
 */

#include <stdlib.h>

/* The value of argv[0], looked up in PATH, with symlinks resolved and
   "." components removed.
   $ foo                            -> "foo" or "subdir/foo" or "/bin/foo"
   $ ./foo                          -> "foo"
   $ subdir/foo                     -> "subdir/foo"
   $ symlink/foo                    -> "subdir/foo"
   $ /bin/foo                       -> "/bin/foo"
 */
static const char *
get_resolved_program_invocation_name ()
{
  return getexecname ();
}

#if HAVE_GETPROGNAME /* Solaris 11 2010-11 */

/* The part after the last slash (if any) of the value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ /bin/foo                       -> "foo"
 */
static const char *
get_program_invocation_short_name ()
{
  return getprogname ();
}

#else

static const char *
get_program_invocation_short_name ()
{
  return NULL;
}

#endif

#include <stdio.h>

int
main ()
{
  printf ("get_resolved_program_invocation_name() = %s\n", get_resolved_program_invocation_name () != NULL ? get_resolved_program_invocation_name () : "(null)");
  printf ("get_program_invocation_short_name() = %s\n", get_program_invocation_short_name () != NULL ? get_program_invocation_short_name () : "(null)");
  return 0;
}

#endif

/* ============================ Cygwin systems ============================ */
#if !1

/* References:
   <http://www.cygwin.com/cygwin-api/std-bsd.html>
   <http://msdn.microsoft.com/en-us/library/ms683197(v=vs.85).aspx>
 */

#include <stdlib.h>

/* The part after the last slash (if any) of the value of argv[0].
   $ foo                            -> "foo"
   $ ./foo                          -> "foo"
   $ ./foo.exe                      -> "foo"
   $ /bin/foo                       -> "foo"
 */
static const char *
get_program_invocation_short_name ()
{
  return getprogname ();
}

#define WIN32_LEAN_AND_MEAN  /* avoid including junk */
#include <windows.h>

#include <string.h>

/* The canonicalized file name of the program.
   $ foo                            -> "/cygdrive/c/gnu/bin/foo.exe"
   $ ./foo                          -> "/cygdrive/c/gnu/bin/foo.exe"
   $ /bin/foo                       -> "/cygdrive/c/bin/foo.exe"
 */
static char *
get_program_canonicalized_name1 ()
{
  char location[MAX_PATH + 1];
  char location_as_posix_path[2 * MAX_PATH];
  int length = GetModuleFileName (NULL, location, sizeof (location) - 1);
  if (length < 0)
    return NULL;
  if (length == sizeof (location) - 1)
    location[length] = '\0';
  /* On Cygwin, we need to convert paths coming from Win32 system calls
     to the Unix-like slashified notation.  */
  /* There's no error return defined for cygwin_conv_to_posix_path.
     See cygwin-api/func-cygwin-conv-to-posix-path.html.
     Does it overflow the buffer of expected size MAX_PATH or does it
     truncate the path?  I don't know.  Let's catch both.  */
  cygwin_conv_to_posix_path (location, location_as_posix_path);
  location_as_posix_path[MAX_PATH - 1] = '\0';
  if (strlen (location_as_posix_path) >= MAX_PATH - 1)
    /* A sign of buffer overflow or path truncation.  */
    return NULL;
  return strdup (location_as_posix_path);
}

/* An alternative implementation, that works with cygwin-1.5.13 (2005-03-01)
   or newer.  */

#include <limits.h>
#include <string.h>
#include <unistd.h>

/* The canonicalized file name of the program.
   $ foo                            -> "/opt/gnu/bin/foo"
   $ ./foo                          -> "/opt/gnu/bin/foo"
   $ /bin/foo                       -> "/opt/gnu/bin/foo"
 */
static char *
get_program_canonicalized_name2 ()
{
  char buf[PATH_MAX];
  int ret;

  ret = readlink ("/proc/self/exe", buf, sizeof (buf));
  if (ret < 0)
    return NULL;
  return strndup (buf, ret);
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_invocation_short_name() = %s\n", get_program_invocation_short_name ());
  printf ("get_program_canonicalized_name1() = %s\n", get_program_canonicalized_name1 ());
  printf ("get_program_canonicalized_name2() = %s\n", get_program_canonicalized_name2 ());
  return 0;
}

#endif

/* ============================= mingw systems ============================= */
#if !1

/* Reference:
   <http://msdn.microsoft.com/en-us/library/ms683197(v=vs.85).aspx>
 */

#define WIN32_LEAN_AND_MEAN  /* avoid including junk */
#include <windows.h>

#include <string.h>

/* The canonicalized file name of the program.
   $ foo                            -> "C:\\gnu\\bin\\foo.exe"
   $ ./foo                          -> "C:\\gnu\\bin\\foo.exe"
   $ /bin/foo                       -> "C:\\bin\\foo.exe"
 */
static char *
get_program_canonicalized_name ()
{
  char location[MAX_PATH + 1];
  int length = GetModuleFileName (NULL, location, sizeof (location) - 1);
  if (length < 0)
    return NULL;
  if (length == sizeof (location) - 1)
    location[length] = '\0';
  return strdup (location);
}

#include <stdio.h>

int
main ()
{
  printf ("get_program_canonicalized_name() = %s\n", get_program_canonicalized_name ());
  return 0;
}

#endif

/* ============================ Interix systems ============================ */

/* ============================= BeOS systems ============================= */

Reply via email to