We recently upgraded a server with an Adaptec 2100S card to Linux 2.6.18 (i686 architecture). After the upgrade we started getting segfaults from the "raideng" program, identical to the problem described in bug #332229.
To make a long story short, after some debugging I was able to track down the problem to a bug in the upstream raidutil source code. Specifically, the bug in question was triggered under the following conditions: * using the i2o_* modules (instead of dtp_i2o) * the i2o_config module is loaded (and /dev/i2octl exists) * using the "raidutils" Debian package, version 0.0.6-3 or 0.0.6-3.2 . The symptoms of this particular problem are: * after kicking off "raidutil -L all", there is about a 10-second delay, and then the error message "Engine connect failed: Open" * running "raideng /VERBOSE" does not produce _any_ output other than the "Segmentation fault" error. * running "strace raideng" shows the following steps (at the bottom of the trace) 1a) successful "open" of "/dev/i2octl" 1b) "I2OGETIOPS" ioctl call on the i2octl file handle 1c) close of i2octl file handle 2) unsuccessful attempts to open /dev/dti0 through dtpi17 3) "--- SIGSEGV (Segmentation fault) @ 0 (0) ---" It turns out that is the ioctl call in 1b) that caused the problem. The program was passing in a pointer to an "int" variable, but the I2OGETIOPS ioctl actually expects a pointer to a (largish) data buffer, and thus the program's stack frame gets clobbered when the ioctl routine runs. The program proceeds to look for the old-style device names (step 2) , but then segfaults when that function exits and the return address stored in the stack is zero (step 3). As far as I can tell, the "rpm" package that people have been able to run successfully also has the same bug, but I guess some difference in the compilation environment there meant that the stack-clobbering didn't consistently trigger a segfault as it did with the Debian binary. However, if I understand the workings of the I2OGETIOPS ioctl correctly, I think the Red Hat version (or any other based on that original source) would not work correctly on machines that have more than one I2O adapter installed. (That is, due to the way the I2OGETIOPS data buffer is populated, the fact that the original raideng code is treating data-buffer space as an "int" variable happens to work if the ioctl reports exactly one active unit, but I don't belive it would work otherwise.) I am attaching a patch against raideng/osd_unix.cpp (from raidutils 0.0.6-3.2) which is my attempt at fixing this problem. I'm not very familiar with the "raidutil" code and certainly could have missed something, but we've built new Debian packages locally after applying this patch and so far it has worked well for us. I should mention that the first two "chunks" of the patch (the ones inside the osdIOrequest and osdOpenEngine functions, respectively) are not strictly related to the I2OGETIOPS segfault problem, but I added them to improve the error reporting in situations that raideng can't find any active device files to open. The osdOpenEngine patch causes the program to abort with an error message if no HBAs are found during the program initialization. At least as far as using "raideng" with the "raidutil" command, I couldn't see any reason for raideng to keep running in that situation, but I'm not familiar enough with the package to know for sure if there is some other situation where this abort is undesirable.... On the other hand, the osdIOrequest change simply adds a check to make sure that the given device file was previously found to point to an active HBA before attempting to open that file. Since it doesn't seem likely that the open will work this time if it just failed a second or two earlier (during the initialization pass), I'm fairly confident that this is a safe change. Together, these two changes (or the osdIOrequest change by itself) will prevent people from getting the misleading "File /dev/dpti17 Could Not Be Opened" message when the actual problem is that the /dev/i2octl file doesn't exist. Anyway, hopefully this patch helps resolve this segfault problem for the other people who have run into it. Nathan p.s. I based my fix on the I2OGETIOPS documentation found in the kernel Documentation/i2o/ioctl file, e.g. http://lxr.linux.no/linux+v2.6.18.5/Documentation/i2o/ioctl#L33 ---------------------------------------------------------------------------- Nathan Stratton Treadway - [EMAIL PROTECTED] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
--- deborig/raidutils-0.0.6/raideng/osd_unix.cpp 2008-01-28 14:28:45.000000000 -0500 +++ deblocal/raidutils-0.0.6/raideng/osd_unix.cpp 2008-01-28 14:21:41.232389361 -0500 @@ -378,32 +378,44 @@ if(ioMethod==DPT_IO_PASS_THRU) { + // make sure the device entry represents an active device and not + // just a device file that was probed without actually finding a + // device, so we don't wait 20 seconds trying to connect to + // it... + if(DefaultHbaDev->Flags == 0 ) + { + FormatTimeString(TimeString,time(0)); + printf("\nosdIDrequest : %s Fatal error, DefaultHbaDev does not point to an active controller.\n", TimeString); + fflush(stdout); + } + else { - /* Try To Open The First Adapter Device */ + /* Try To Open The First Adapter Device */ - for(Index = 0; Index < 20; ++Index) - { - FileID = open(DefaultHbaDev->NodeName,O_RDONLY); - if((FileID == -1)&&(errno == ENOENT)) - { - sleep(1); - } - else { - break; - } - } + for(Index = 0; Index < 20; ++Index) + { + FileID = open(DefaultHbaDev->NodeName,O_RDONLY); + if((FileID == -1)&&(errno == ENOENT)) + { + sleep(1); + } + else { + break; + } + } #ifdef _SINIX_ADDON - if (DemoMode) - FileID = 99; + if (DemoMode) + FileID = 99; #endif - if(FileID != -1) - { - retVal = MSG_RTN_COMPLETED; - close(FileID); - } - else printf("\nosdIOrequest : File %s Could Not Be Opened", - DefaultHbaDev->NodeName); + if(FileID != -1) + { + retVal = MSG_RTN_COMPLETED; + close(FileID); + } + else printf("\nosdIOrequest : File %s Could Not Be Opened", + DefaultHbaDev->NodeName); + } } if(Verbose) printf("\nosdIOrequest : Return = %x",retVal); @@ -505,6 +517,15 @@ retVal = MSG_RTN_COMPLETED; NumHBAs = BuildNodeNameList(); + // If there are no HBAs found, abort with an explict error message. + if(NumHBAs == 0) + { + FormatTimeString(TimeString,time(0)); + printf("\nosdOpenEngine : %s Fatal error, no active controller device files found.\n", TimeString); + retVal = MSG_RTN_FAILED; + fflush(stdout); + } + if(Verbose) { FormatTimeString(TimeString,time(0)); @@ -3747,16 +3768,56 @@ NumEntries = 0; # if (defined(_DPT_LINUX_I2O)) + uCHAR LinuxI2ODataBuff[MAX_I2O_CONTROLLERS]; + memset(&pkt, 0, sizeof(EATA_CP)); HbaDevs[NumEntries].Flags = 0; strcpy(HbaDevs[NumEntries].NodeName, DEV_CTL); - IoctlRtn = osdSendIoctl(&HbaDevs[NumEntries], I2OGETIOPS, (uCHAR *)&NumEntries, &pkt); + IoctlRtn = osdSendIoctl(&HbaDevs[NumEntries], I2OGETIOPS, LinuxI2ODataBuff, &pkt); if(!IoctlRtn) { - for(i = 0; i < NumEntries; i ++) { - HbaDevs[i].Flags = NODE_FILE_VALID_HBA_B | NODE_FILE_I2O_HBA_B; - HbaDevs[i].IoAddress = UINTPTR_MAX; - strcpy(HbaDevs[i].NodeName, DEV_CTL); - } + // step through the returned data buffer and look for the + // non-zero entries, which indicate an active IOP. For each + // one we find, add a corresponding entry in HbaDevs. + for(i = 0; i < MAX_I2O_CONTROLLERS; i ++) { + if ( LinuxI2ODataBuff[i] != 0 ) + { + if(NumEntries >= MAX_HAS) + { + FormatTimeString(TimeString,time(0)); + + printf("\nBuildNodeNameList : %s Warning: Found more than %d Linux I2O Controlers; ignoring those that won't fit in the HbaDevs array.", + TimeString, MAX_HAS); + + fflush(stdout); + break; + } + if(Verbose) + { + FormatTimeString(TimeString,time(0)); + + printf("\nBuildNodeNameList : %s Found Linux I2O Controler, using %s device file for utility-relative controller number %d.", + TimeString, DEV_CTL, NumEntries); + + fflush(stdout); + } + + HbaDevs[NumEntries].Flags = NODE_FILE_VALID_HBA_B | NODE_FILE_I2O_HBA_B; + HbaDevs[NumEntries].IoAddress = UINTPTR_MAX; + strcpy(HbaDevs[NumEntries].NodeName, DEV_CTL); + + ++NumEntries; + } + else { + // for now, we'll assume that all the active IOP entries + // are at the front of the returned buffer. In order to + // support "gaps", we'd need to record the IOP index in the + // NodeFiles_S structure and use that instead of HbaNum when + // we call the I2OPASSTHRU ioctl (or make sure that + // everything that looks at HbaDevs can handle inactive + // entries in the middle of the array). + break; + } + } // for(i = 0; i < MAX_I2O_CONTROLLERS; i ++) } # endif
signature.asc
Description: Digital signature