So, turns out that if I had naively changed all calls to terror or references to sys_errlist, to using strerror_r, then I would have broken code for Windows and HPUX (and possibly other OSes).

If we are to assume that current code runs fine on all platforms (maybe even AIX an MacOS, for example), then any change/additions made to the code and not ifdeffed appropriately can break on other OSes. On the other hand, too many ifdefs can pollute the code source and render it less readable (though possibly less important).

In the general case what are code contributors responsibilities to adding code regarding OSes besides Linux ?
What OSes does jenkins test on ?
I guess maintainers of code on non-tested platforms are responsible for their own testing ?

How do we avoid the ping-pong effect, i.e. I make a generic change to code which breaks on Windows, then the Windows maintainer reverts changes to break on Solaris for example ? Or does this not happen in actuality ?

On 12/11/2014 11:25 PM, Asokan, M wrote:
Hi Malcom,
   The Windows versions of strerror() and strerror_s() functions are probably meant for 
ANSI C library functions that set errno.  For core Windows API calls (like UNIX system 
calls), one gets the error number by calling GetLastError() function.  In the code 
snippet I sent earlier, the "code" argument is the value returned by 
GetLastError().  Neither strerror() nor strerror_s() will give the correct error message 
for this error code.

You could probably look at libwinutils.c in Hadoop source.  It uses 
FormatMessageW (which returns messages in Unicode.)  My requirement was to 
return messages in current system locale.

-- Asokan
________________________________________
From: malcolm [malcolm.kaval...@oracle.com]
Sent: Thursday, December 11, 2014 4:04 PM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is
the strerror_r equivalent).
Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:
Hi Malcom,
     Recently, I had to work on a function to get system error message on 
various systems.  Here is the piece of code I came up with.  Hope it helps.

static void get_system_error_message(char * buf, int buf_len, int code)
{
#if defined(_WIN32)
      LPVOID lpMsgBuf;
      DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
                                   FORMAT_MESSAGE_FROM_SYSTEM |
                                   FORMAT_MESSAGE_IGNORE_INSERTS,
                                   NULL, code,
                                   MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
                                                           /* Default language 
*/
                                   (LPTSTR) &lpMsgBuf, 0, NULL);
      if (status > 0)
      {
          strncpy(buf, (char *)lpMsgBuf, buf_len-1);
          buf[buf_len-1] = '\0';
          /* Free the buffer returned by system */
          LocalFree(lpMsgBuf);
      }
      else
      {
          _snprintf(buf, buf_len-1 , "%s %d",
              "Can't get system error message for code", code);
          buf[buf_len-1] = '\0';
      }
#else
#if defined(_HPUX_SOURCE)
      {
          char * msg;
          errno = 0;
          msg = strerror(code);
          if (errno == 0)
          {
              strncpy(buf, msg, buf_len-1);
              buf[buf_len-1] = '\0';
          }
          else
          {
              snprintf(buf, buf_len, "%s %d",
                  "Can't get system error message for code", code);
          }
      }
#else
      if (strerror_r(code, buf, buf_len) != 0)
      {
          snprintf(buf, buf_len, "%s %d",
              "Can't get system error message for code", code);
      }
#endif
#endif
}

Note that HPUX does not have strerror_r() since strerror() itself is 
thread-safe.  Also Windows does not have snprintf().  The equivalent function 
_snprintf() has a subtle difference in its interface.

-- Asokan
________________________________________
From: malcolm [malcolm.kaval...@oracle.com]
Sent: Thursday, December 11, 2014 11:02 AM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris 
will mean the libhadoop.so will need to be tied a specific build 
(kernel/include pairing) and therefore limits upward mobility/compatibility.  
That doesn’t seem like a very good idea.

IMO, switching to strerror_r is much preferred, since other than the brain-dead 
GNU libc version, is highly portable and should work regardless of the kernel 
or OS in place.

On Dec 11, 2014, at 5:20 AM, malcolm <malcolm.kaval...@oracle.com> wrote:

FYI, there are a couple more files that reference sys_errlist directly (not 
just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c

On 12/11/2014 07:38 AM, malcolm wrote:
Hi Colin,

Exactly, as you noticed, the problem is the thread-local buffer needed to 
return from terror.
Currently, terror just returns a static string from an array, this is fast, 
simple and error-proof.

In order to use strerror_r inside terror,  would require allocating a buffer 
inside terror  and depend on the caller to free the buffer after using it, or 
to pass a buffer to terrror (which is basically the same as strerror_r, 
rendering terror redundant).
Both cases require modification outside terror itself, as far as I can tell, no 
simple fix. Unless you have an alternative which I haven't thought of ?

As far as I can tell, we have two choices:

1. Remove terror and replace calls with strerror_r, passing a buffer from the 
callee.
      Advantage: a more modern portable interface.
      Disadvantage: All calls to terror need to be modified, though all seem to 
be in a few files as far as I can tell.

2. Adding a sys_errlist array (ifdeffed for Solaris)
      Advantage: no change to any calls to terror
      Disadvantage: 2 additional files added to source tree (.c and .h) and 
some minor ifdefs only used for Solaris.

I think it is more a question of style than anything else, so I leave you to 
make the call.

Thanks for your patience,
Malcolm





On 12/10/2014 09:54 PM, Colin McCabe wrote:
On Wed, Dec 10, 2014 at 2:31 AM, malcolm <malcolm.kaval...@oracle.com> wrote:
Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes
sys_errlist by errno to return the error message from the array. This
function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror, but
there can be issues with multi-threading (it returns a buffer which can be
overwritten), so it seemed simpler just to recreate the sys_errlist message
array.

There is also a multi-threaded version strerror_r where you pass the buffer
as a parameter, but this would necessitate changing every call to terror
with mutiple lines of code.
Why don't you just use strerror_r inside terror()?

I wrote that code originally.  The reason I didn't want to use
strerror_r there is because GNU libc provides a non-POSIX definition
of strerror_r, and forcing it to use the POSIX one is a pain. But you
can do it.  You also will require a thread-local buffer to hold the
return from strerror_r, since it is not guaranteed to be static
(although in practice it is 99% of the time-- another annoyance with
the API).


________________________________



ATTENTION: -----

The information contained in this message (including any files transmitted with 
this message) may contain proprietary, trade secret or other confidential 
and/or legally privileged information. Any pricing information contained in 
this message or in any files transmitted with this message is always 
confidential and cannot be shared with any third parties without prior written 
approval from Syncsort. This message is intended to be read only by the 
individual or entity to whom it is addressed or by their designee. If the 
reader of this message is not the intended recipient, you are on notice that 
any use, disclosure, copying or distribution of this message, in any form, is 
strictly prohibited. If you have received this message in error, please 
immediately notify the sender and/or Syncsort and destroy all copies of this 
message in your possession, custody or control.

Reply via email to