Done, and added the comment as you requested.
I attached a second patch file to the JIRA (with .002 appended as per convention) assuming Jenkins knows to take the latest version, since I understand that I cannot remove the previous patch file .

On 12/16/2014 04:12 AM, Colin McCabe wrote:
Thanks, Malcom.  I reviewed it.  The only thing you still have to do
is hit "submit patch" to get a Jenkins run.  See our HowToContribute
wiki page for more details.

wiki.apache.org/hadoop/HowToContribute

best,
Colin

On Sat, Dec 13, 2014 at 9:22 PM, malcolm <malcolm.kaval...@oracle.com> wrote:
I am checking on the latest release of Solaris 11 and yes, it is still
thread safe (or MT Safe as documented on the man page).

strerror checks the error code, and returns the same "unknown error" string
as terror does, if it receives an invalid code. I checked this on Windows,
Solaris and Linux (though my changes only affect Solaris platforms).

JIRA newbie question:

I have filed the JIRA attaching the patch  HADOOP-11403 against the trunk,
asking for reviewers in the comments section.
Is there any other protocol I should follow ?

Thanks,
Malcolm


On 12/14/2014 01:08 AM, Asokan, M wrote:
Malcom,
     That's great! Is strerror() thread-safe in the recent version of
Solaris?  In any case, to be correct you still need to make sure that the
code passed to strerror() is a valid one.  For this you need to check errno
after the call to strerror().  Please check the code snippet I sent earlier
for HPUX.

-- Asokan
________________________________________
From: malcolm [malcolm.kaval...@oracle.com]
Sent: Saturday, December 13, 2014 3:13 PM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port SOLVED!

Wiping egg off face  ...

After consulting with the Solaris team (and looking at the source code
and man page) ,  it turns out that strerror itself on Solaris is MT-Safe
! (Just like HPUX)

So, after all this effort, all I need to do is modify terror as follows:

      const char* terror(int errnum)
      {

      #if defined(__sun)
         return strerror(errnum); //  MT-Safe under Solaris
      #else
         if ((errnum < 0) || (errnum >= sys_nerr)) {
           return "unknown error.";
         }
         return sys_errlist[errnum];
      #endif
      }

And in two other files where sys_errlist is referenced directly
(NativeIO and hdfs_http_client.c), I replaced this direct access instead
with a call to terror.

Thanks for all your help and patience,

I'll file a JIRA asap,

Cheers,
Malcolm

On 12/13/2014 05:26 PM, malcolm wrote:
Thanks Asokan,

Looked up Gcc's thread local variables, seems a bit complex though and
quite specific to Gnu.

Intialization of the static errlist array should be thread safe i.e.
initially the array is nulled out, and afterwards if two threads write
to the same address, then they would be writing the same string.

But if we are ok with changing 5 files, not just terror, then I would
just remove terror completely and use strerror_r (or the alternatives
for Windows and HP_UX) in the caller code instead i.e. using your
suggestion for a local buffer in the caller, wherever needed. The more
I think about it, the more this seems to be the right thing to do.

Cheers,
Malcolm


On 12/13/2014 04:38 PM, Asokan, M wrote:
Malcom,
      Gcc supports thread-local variables. See

https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html

I am not sure about native compilers on Solaris, HPUX, or AIX.

In any case, I found out that the Windows native code in Hadoop seems
to handle error messages properly. Here is what I found:

$ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to
file a jira

FormatMessage|awk -F: '{print $1}'|sort -u

/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c


/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c


/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c



$ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk
-F: '{print $1}'|sort -u

/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c


/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c


/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c


/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c


/home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c



This means you need not worry about the Windows version of terror().
You need to change five files that contain UNIX specific native code.

I have a question on your suggested implementation:

How do you initialize the static errlist array in a thread-safe manner?

________________________________
Here is another thread-safe implementation that I could come up with:

#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>

#define MESSAGE_BUFFER_SIZE 256

char * getSystemErrorMessage(char * buf, int buf_len, int code) {
#if defined(_HPUX_SOURCE)
     char * msg;
     errno = 0;
     msg = strerror(code);
     if (errno == 0) {
       strncpy(buf, msg, buf_len-1);
       buf[buf_len-1] = '\0';
     } else {
       snprintf(buf, buf_len, "%s %d",
           "Can't get system error message for code", code);
     }
#else
     if (strerror_r(code, buf, buf_len) != 0) {
       snprintf(buf, buf_len, "%s %d",
           "Can't get system error message for code", code);
     }
#endif
     return buf;
}

#define TERROR(code) \
getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code)

int main(int argc, char ** argv) {
     if (argc > 1) {
       char messageBuffer[MESSAGE_BUFFER_SIZE];
       int code = atoi(argv[1]);

       fprintf(stderr, "System error for code %s: %s\n", argv[1],
TERROR(code));
     }
     return 0;
}


This changes terror to a macro TERROR and requires all functions that
call TERROR macro to declare the local variable messageBuffer. Since
there are only five files to modify, I think it is not a big effort.
What do you think?

-- Asokan

On 12/13/2014 04:29 AM, malcolm wrote:
Colin,

I am not sure what you mean by a thread-local buffer (in native
code). In Java this is pretty standard, but I couldn't find any
implementation for C code.

Here is the terror function:

       const char* terror(int errnum)
       {
         if ((errnum < 0) || (errnum >= sys_nerr)) {
           return "unknown error.";
         }
         return sys_errlist[errnum];
       }


The interface is identical to strerror, but the implementation is
actually re-entrant since it returns a pointer to a static string.

If I understand your suggestion, the new function would look like this:

      const char* terror(int errnum)
      {
         static char result[65];

         strerror_r(errnum, result, 64);

         return result;
      }

No need for snprintf, strerror_r  has the 'n' bounding built-in.

Of course, this is still non-re-entrant, so unless the caller copies
the returned buffer, before the function is called again, there is a
problem.

After considerable thought, I have come up with this version of
terror, tested OK on Windows, Linux and Solaris:

      #if defined(_WIN32)
      #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
      #endif

      #define MAX_ERRORS 256
      #define MAX_ERROR_LEN 80

      char *terror(int errnum)
      {

         static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
      error messages

         if ( errnum >= 0 && errnum < MAX_ERRORS )
           {
             if ( errlist[errnum][0] == 0 )
               strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);

             return errlist[errnum];
           }
         else
           {
             return "Unknown error";
           }
      }

This version is portable and re-entrant.

On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133,
and on Solaris 11.1 we get 151.

If this is OK with you, I will open a jira for this.


Thanks,
Malcolm


On 12/12/2014 11:10 PM, Colin McCabe wrote:
Just use snprintf to copy the error message from strerror_r into a
thread-local buffer of 64 bytes or so.  Then preserve the existing
terror() interface.

Can you open a jira for this?

best,
Colin

On Thu, Dec 11, 2014 at 8:35 PM,
malcolm<malcolm.kaval...@oracle.com><mailto:malcolm.kaval...@oracle.com>
wrote:
So, turns out that if I had naively changed all calls to terror or
references to sys_errlist, to using strerror_r, then I would have broken
code for Windows and HPUX (and possibly other OSes).

If we are to assume that current code runs fine on all platforms
(maybe even
AIX an MacOS, for example), then any change/additions made to the
code and
not ifdeffed appropriately can break on other OSes. On the other
hand,  too
many ifdefs can pollute the code source and render it less readable
(though
possibly less important).

In the general case what are code contributors responsibilities to
adding
code regarding OSes besides Linux ?
What OSes does jenkins test on ?
I guess maintainers of code on non-tested platforms are responsible for
their own testing ?

How do we avoid the ping-pong effect, i.e. I make a generic change to
code
which breaks on Windows, then the Windows maintainer reverts changes to
break on Solaris for example ? Or does this not happen in actuality ?


On 12/11/2014 11:25 PM, Asokan, M wrote:
Hi Malcom,
       The Windows versions of strerror() and strerror_s() functions are
probably meant for ANSI C library functions that set errno.  For core
Windows API calls (like UNIX system calls), one gets the error number by
calling GetLastError() function.  In the code snippet I sent earlier,
the
"code" argument is the value returned by GetLastError(). Neither
strerror()
nor strerror_s() will give the correct error message for this error
code.

You could probably look at libwinutils.c in Hadoop source.  It uses
FormatMessageW (which returns messages in Unicode.)  My requirement
was to
return messages in current system locale.

-- Asokan
________________________________________
From: malcolm
[malcolm.kaval...@oracle.com<mailto:malcolm.kaval...@oracle.com>]
Sent: Thursday, December 11, 2014 4:04 PM
To:common-dev@hadoop.apache.org<mailto:To:common-dev@hadoop.apache.org>
Subject: Re: Solaris Port

Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is
the strerror_r equivalent).
Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:
Hi Malcom,
         Recently, I had to work on a function to get system error
message on
various systems.  Here is the piece of code I came up with. Hope it
helps.

static void get_system_error_message(char * buf, int buf_len, int code)
{
#if defined(_WIN32)
          LPVOID lpMsgBuf;
          DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
                                       FORMAT_MESSAGE_FROM_SYSTEM |
FORMAT_MESSAGE_IGNORE_INSERTS,
                                       NULL, code,
                                       MAKELANGID(LANG_NEUTRAL,
SUBLANG_DEFAULT),
                                                               /* Default
language */
                                       (LPTSTR) &lpMsgBuf, 0, NULL);
          if (status > 0)
          {
              strncpy(buf, (char *)lpMsgBuf, buf_len-1);
              buf[buf_len-1] = '\0';
              /* Free the buffer returned by system */
              LocalFree(lpMsgBuf);
          }
          else
          {
              _snprintf(buf, buf_len-1 , "%s %d",
                  "Can't get system error message for code", code);
              buf[buf_len-1] = '\0';
          }
#else
#if defined(_HPUX_SOURCE)
          {
              char * msg;
              errno = 0;
              msg = strerror(code);
              if (errno == 0)
              {
                  strncpy(buf, msg, buf_len-1);
                  buf[buf_len-1] = '\0';
              }
              else
              {
                  snprintf(buf, buf_len, "%s %d",
                      "Can't get system error message for code", code);
              }
          }
#else
          if (strerror_r(code, buf, buf_len) != 0)
          {
              snprintf(buf, buf_len, "%s %d",
                  "Can't get system error message for code", code);
          }
#endif
#endif
}

Note that HPUX does not have strerror_r() since strerror() itself is
thread-safe.  Also Windows does not have snprintf().  The equivalent
function _snprintf() has a subtle difference in its interface.

-- Asokan
________________________________________
From: malcolm
[malcolm.kaval...@oracle.com<mailto:malcolm.kaval...@oracle.com>]
Sent: Thursday, December 11, 2014 11:02 AM
To:common-dev@hadoop.apache.org<mailto:To:common-dev@hadoop.apache.org>
Subject: Re: Solaris Port

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
sys_errlist was removed for a reason.  Creating a fake sys_errlist on
Solaris will mean the libhadoop.so will need to be tied a specific build
(kernel/include pairing) and therefore limits upward
mobility/compatibility.
That doesn’t seem like a very good idea.

IMO, switching to strerror_r is much preferred, since other than the
brain-dead GNU libc version, is highly portable and should work
regardless
of the kernel or OS in place.

On Dec 11, 2014, at 5:20 AM,
malcolm<malcolm.kaval...@oracle.com><mailto:malcolm.kaval...@oracle.com>
wrote:

FYI, there are a couple more files that reference sys_errlist directly
(not just terror within exception.c) , but also hdfs_http_client.c and
NativeiO.c

On 12/11/2014 07:38 AM, malcolm wrote:
Hi Colin,

Exactly, as you noticed, the problem is the thread-local buffer needed
to return from terror.
Currently, terror just returns a static string from an array, this is
fast, simple and error-proof.

In order to use strerror_r inside terror,  would require allocating a
buffer inside terror  and depend on the caller to free the buffer after
using it, or to pass a buffer to terrror (which is basically the same as
strerror_r, rendering terror redundant).
Both cases require modification outside terror itself, as far as I can
tell, no simple fix. Unless you have an alternative which I haven't
thought
of ?

As far as I can tell, we have two choices:

1. Remove terror and replace calls with strerror_r, passing a buffer
from the callee.
          Advantage: a more modern portable interface.
          Disadvantage: All calls to terror need to be modified, though
all seem to be in a few files as far as I can tell.

2. Adding a sys_errlist array (ifdeffed for Solaris)
          Advantage: no change to any calls to terror
          Disadvantage: 2 additional files added to source tree (.c and
.h) and some minor ifdefs only used for Solaris.

I think it is more a question of style than anything else, so I leave
you to make the call.

Thanks for your patience,
Malcolm





On 12/10/2014 09:54 PM, Colin McCabe wrote:
On Wed, Dec 10, 2014 at 2:31 AM, malcolm
<malcolm.kaval...@oracle.com><mailto:malcolm.kaval...@oracle.com> wrote:
Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes
sys_errlist by errno to return the error message from the array.
This
function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror,
but
there can be issues with multi-threading (it returns a buffer which
can be
overwritten), so it seemed simpler just to recreate the sys_errlist
message
array.

There is also a multi-threaded version strerror_r where you pass the
buffer
as a parameter, but this would necessitate changing every call to
terror
with mutiple lines of code.
Why don't you just use strerror_r inside terror()?

I wrote that code originally.  The reason I didn't want to use
strerror_r there is because GNU libc provides a non-POSIX definition
of strerror_r, and forcing it to use the POSIX one is a pain. But you
can do it.  You also will require a thread-local buffer to hold the
return from strerror_r, since it is not guaranteed to be static
(although in practice it is 99% of the time-- another annoyance with
the API).


________________________________



ATTENTION: -----

The information contained in this message (including any files
transmitted with this message) may contain proprietary, trade secret or
other confidential and/or legally privileged information. Any pricing
information contained in this message or in any files transmitted
with this
message is always confidential and cannot be shared with any third
parties
without prior written approval from Syncsort. This message is
intended to be
read only by the individual or entity to whom it is addressed or by
their
designee. If the reader of this message is not the intended
recipient, you
are on notice that any use, disclosure, copying or distribution of this
message, in any form, is strictly prohibited. If you have received this
message in error, please immediately notify the sender and/or
Syncsort and
destroy all copies of this message in your possession, custody or
control.



________________________________



ATTENTION: -----

The information contained in this message (including any files transmitted
with this message) may contain proprietary, trade secret or other
confidential and/or legally privileged information. Any pricing information
contained in this message or in any files transmitted with this message is
always confidential and cannot be shared with any third parties without
prior written approval from Syncsort. This message is intended to be read
only by the individual or entity to whom it is addressed or by their
designee. If the reader of this message is not the intended recipient, you
are on notice that any use, disclosure, copying or distribution of this
message, in any form, is strictly prohibited. If you have received this
message in error, please immediately notify the sender and/or Syncsort and
destroy all copies of this message in your possession, custody or control.


Reply via email to