We recently upgraded a server with an Adaptec 2100S card to Linux 2.6.18
(i686 architecture).  After the upgrade we started getting segfaults
from the "raideng" program, identical to the problem described in bug
#332229.

To make a long story short, after some debugging I was able to track
down the problem to a bug in the upstream raidutil source code.

Specifically, the bug in question was triggered under the following
conditions:

  * using the i2o_* modules (instead of dtp_i2o)
  * the i2o_config module is loaded (and /dev/i2octl exists)
  * using the "raidutils" Debian package, version 0.0.6-3 or 0.0.6-3.2 .

The symptoms of this particular problem are:
  * after kicking off "raidutil -L all", there is about a 10-second
    delay, and then the error message "Engine connect failed: Open"

  * running "raideng /VERBOSE" does not produce _any_ output other than
    the "Segmentation fault" error.

  * running "strace raideng" shows the following steps (at the bottom
    of the trace)
       1a) successful "open" of "/dev/i2octl"
       1b) "I2OGETIOPS" ioctl call on the i2octl file handle
       1c) close of i2octl file handle
       2)  unsuccessful attempts to open /dev/dti0 through dtpi17
       3)  "--- SIGSEGV (Segmentation fault) @ 0 (0) ---"

It turns out that is the ioctl call in 1b) that caused the problem.  The
program was passing in a pointer to an "int" variable, but the
I2OGETIOPS ioctl actually expects a pointer to a (largish) data buffer,
and thus the program's stack frame gets clobbered when the ioctl routine
runs.  The program proceeds to look for the old-style device names (step
2) , but then segfaults when that function exits and the return address
stored in the stack is zero (step 3).

As far as I can tell, the "rpm" package that people have been able to
run successfully also has the same bug, but I guess some difference in
the compilation environment there meant that the stack-clobbering didn't
consistently trigger a segfault as it did with the Debian binary. 
However, if I understand the workings of the I2OGETIOPS ioctl correctly,
I think the Red Hat version (or any other based on that original source)
would not work correctly on machines that have more than one I2O adapter
installed.

(That is, due to the way the I2OGETIOPS data buffer is populated, the
fact that the original raideng code is treating data-buffer space as an
"int" variable happens to work if the ioctl reports exactly one active
unit, but I don't belive it would work otherwise.)


I am attaching a patch against raideng/osd_unix.cpp (from raidutils
0.0.6-3.2) which is my attempt at fixing this problem.

I'm not very familiar with the "raidutil" code and certainly could have
missed something, but we've built new Debian packages locally after
applying this patch and so far it has worked well for us.


I should mention that the first two "chunks" of the patch (the ones
inside the osdIOrequest and osdOpenEngine functions, respectively) are
not strictly related to the I2OGETIOPS segfault problem, but I added
them to improve the error reporting in situations that raideng can't
find any active device files to open.

The osdOpenEngine patch causes the program to abort with an error
message if no HBAs are found during the program initialization.  At
least as far as using "raideng" with the "raidutil" command, I couldn't
see any reason for raideng to keep running in that situation, but I'm
not familiar enough with the package to know for sure if there is some
other situation where this abort is undesirable....

On the other hand, the osdIOrequest change simply adds a check to make
sure that the given device file was previously found to point to an
active HBA before attempting to open that file.  Since it doesn't seem
likely that the open will work this time if it just failed a second or
two earlier (during the initialization pass), I'm fairly confident that
this is a safe change.

Together, these two changes (or the osdIOrequest change by itself) will
prevent people from getting the misleading "File /dev/dpti17 Could Not
Be Opened" message when the actual problem is that the /dev/i2octl file
doesn't exist.


Anyway, hopefully this patch helps resolve this segfault problem for the
other people who have run into it.

                                                Nathan

p.s. I based my fix on the I2OGETIOPS documentation found in the
kernel Documentation/i2o/ioctl file, e.g.
  http://lxr.linux.no/linux+v2.6.18.5/Documentation/i2o/ioctl#L33

----------------------------------------------------------------------------
Nathan Stratton Treadway  -  [EMAIL PROTECTED]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- deborig/raidutils-0.0.6/raideng/osd_unix.cpp	2008-01-28 14:28:45.000000000 -0500
+++ deblocal/raidutils-0.0.6/raideng/osd_unix.cpp	2008-01-28 14:21:41.232389361 -0500
@@ -378,32 +378,44 @@
 
    if(ioMethod==DPT_IO_PASS_THRU)
      {
+       // make sure the device entry represents an active device and not
+       // just a device file that was probed without actually finding a
+       // device, so we don't wait 20 seconds trying to connect to 
+       // it...
+       if(DefaultHbaDev->Flags == 0 )
+         {
+           FormatTimeString(TimeString,time(0));
+           printf("\nosdIDrequest   : %s Fatal error, DefaultHbaDev does not point to an active controller.\n", TimeString);
+           fflush(stdout);
+         }
+       else {
 
-  /* Try To Open The First Adapter Device */
+              /* Try To Open The First Adapter Device */
 
-       for(Index = 0; Index < 20; ++Index)
-        {
-          FileID = open(DefaultHbaDev->NodeName,O_RDONLY);
-          if((FileID == -1)&&(errno == ENOENT))
-           {
-             sleep(1);
-           }
-           else {
-                  break;
-           }
-        }
+                     for(Index = 0; Index < 20; ++Index)
+               {
+                 FileID = open(DefaultHbaDev->NodeName,O_RDONLY);
+                 if((FileID == -1)&&(errno == ENOENT))
+                  {
+                    sleep(1);
+                  }
+                  else {
+                         break;
+                  }
+               }
 
 #ifdef _SINIX_ADDON
-       if (DemoMode)
-           FileID = 99;
+              if (DemoMode)
+                  FileID = 99;
 #endif
-       if(FileID != -1)
-         {
-           retVal = MSG_RTN_COMPLETED;
-           close(FileID);
-         }
-       else printf("\nosdIOrequest : File %s Could Not Be Opened",
-                     DefaultHbaDev->NodeName);
+              if(FileID != -1)
+                {
+                  retVal = MSG_RTN_COMPLETED;
+                  close(FileID);
+                }
+              else printf("\nosdIOrequest : File %s Could Not Be Opened",
+                            DefaultHbaDev->NodeName);
+            }
      }
    if(Verbose)
         printf("\nosdIOrequest   : Return = %x",retVal);
@@ -505,6 +517,15 @@
    retVal = MSG_RTN_COMPLETED;
    NumHBAs = BuildNodeNameList();
 
+   // If there are no HBAs found, abort with an explict error message.
+   if(NumHBAs == 0)
+     {
+     FormatTimeString(TimeString,time(0));
+     printf("\nosdOpenEngine  : %s Fatal error, no active controller device files found.\n", TimeString);
+     retVal = MSG_RTN_FAILED;
+     fflush(stdout);
+     }
+
    if(Verbose)
      {
         FormatTimeString(TimeString,time(0));
@@ -3747,16 +3768,56 @@
    NumEntries = 0;
 
 #  if (defined(_DPT_LINUX_I2O))
+   uCHAR LinuxI2ODataBuff[MAX_I2O_CONTROLLERS];
+
    memset(&pkt, 0, sizeof(EATA_CP));
    HbaDevs[NumEntries].Flags = 0;
    strcpy(HbaDevs[NumEntries].NodeName, DEV_CTL);
-   IoctlRtn = osdSendIoctl(&HbaDevs[NumEntries], I2OGETIOPS, (uCHAR *)&NumEntries, &pkt);
+   IoctlRtn = osdSendIoctl(&HbaDevs[NumEntries], I2OGETIOPS, LinuxI2ODataBuff, &pkt);
    if(!IoctlRtn) {
-     for(i = 0; i < NumEntries; i ++) {
-       HbaDevs[i].Flags = NODE_FILE_VALID_HBA_B | NODE_FILE_I2O_HBA_B;
-       HbaDevs[i].IoAddress = UINTPTR_MAX;
-       strcpy(HbaDevs[i].NodeName, DEV_CTL);
-     }
+     // step through the returned data buffer and look for the 
+     // non-zero entries, which indicate an active IOP.  For each
+     // one we find, add a corresponding entry in HbaDevs.
+     for(i = 0; i < MAX_I2O_CONTROLLERS; i ++) {
+       if ( LinuxI2ODataBuff[i] != 0  ) 
+        {
+          if(NumEntries >= MAX_HAS)
+           {
+             FormatTimeString(TimeString,time(0));
+
+             printf("\nBuildNodeNameList  : %s Warning: Found more than %d Linux I2O Controlers; ignoring those that won't fit in the HbaDevs array.",
+                    TimeString, MAX_HAS);
+
+             fflush(stdout);
+             break;
+            }
+          if(Verbose)
+            {
+              FormatTimeString(TimeString,time(0));
+
+              printf("\nBuildNodeNameList  : %s Found Linux I2O Controler, using %s device file for utility-relative controller number %d.",
+                     TimeString, DEV_CTL, NumEntries);
+
+              fflush(stdout);
+            }
+
+          HbaDevs[NumEntries].Flags = NODE_FILE_VALID_HBA_B | NODE_FILE_I2O_HBA_B;
+          HbaDevs[NumEntries].IoAddress = UINTPTR_MAX;
+          strcpy(HbaDevs[NumEntries].NodeName, DEV_CTL);
+
+          ++NumEntries;
+        }
+       else {
+              // for now, we'll assume that all the active IOP entries
+              // are at the front of the returned buffer.  In order to
+              // support "gaps", we'd need to record the IOP index in the
+              // NodeFiles_S structure and use that instead of HbaNum when
+              // we call the I2OPASSTHRU ioctl (or make sure that
+              // everything that looks at HbaDevs can handle inactive
+              // entries in the middle of the array).
+              break;
+       }
+     } // for(i = 0; i < MAX_I2O_CONTROLLERS; i ++) 
    }
 #  endif
 

Attachment: signature.asc
Description: Digital signature

Reply via email to