Thanks for the feedback Tom. I'm not sure if it's the Drive or the Controller going bad. The Root Partition is normally hacked when this occurs and requires fsck to reset the partition info.
Previous versions of LM (all the way back to 6.0) have been rock solid. In fact I helped debug the Fast Wide Differential SCSI drivers for the Slackware version. I've not had to deal with Modular Kernels, as I prefer to build a Customized Kernel and restrict the introduction of Martian Modules on the Server. I appreciate your assistance. I'll keep you posted. Tom wrote: > Sort of looks like a hardware problem. I have run SCSI for several years and > the last time I saw that sort of message, one of the hard drives had messed > up. Your message implies several hard drives on the controller. I would first > try going to one HD programmed as device 0 (zero) and double check the cable > connections, termination, etc. Do not assume anything about the SCSI chain. > Try an older version of LM, like 8.1 or 8.2 (in case a sever software bug > creeped into the distro). Let us know how you make out on this problem. > > Cheers! > > On Friday 22 November 2002 11:58 am, you wrote: > > I am attempting to implement LM 9.0 on a 1.2GHz, 1GB Ram Server with > > Adaptec 2940 SCSI disks. Unfortunately the machine dies without > > warning. > > > > While I read with interest the Jack Coates description of the kernel for > > 2.4 I dug out the Kernel info for my machine and found the following > > Kernel warnings. > > > > Nov 22 04:06:22 access kernel: scsi0: WARNING no command > > for scb 0 (cmdcmplt) > > Nov 22 04:06:22 access kernel: QOUTPOS = 33 > > Nov 22 04:07:22 access kernel: scsi0:0:0:0: Attempting to > > queue an ABORT message > > Nov 22 04:07:22 access kernel: scsi0: Dumping Card State > > while idle, at SEQADDR 0x7 > > Nov 22 04:07:22 access kernel: ACCUM = 0xc, SINDEX = 0xe, > > DINDEX = 0x8c, ARG_2 = 0x0 > > Nov 22 04:07:22 access kernel: HCNT = 0x0 SCBPTR = 0xe > > Nov 22 04:07:22 access kernel: SCSISEQ = 0x12, SBLKCTL = > > 0x2 > > Nov 22 04:07:22 access kernel: DFCNTRL = 0x0, DFSTATUS = > > 0x29 > > Nov 22 04:07:22 access kernel: LASTPHASE = 0x1, SCSISIGI = > > 0x0, SXFRCTL0 = 0x80 > > Nov 22 04:07:22 access kernel: SSTAT0 = 0x5, SSTAT1 = 0xa > > Nov 22 04:07:22 access kernel: STACK == 0x0, 0x147, 0xec, > > 0x3 > > Nov 22 04:07:22 access kernel: SCB count = 254 > > Nov 22 04:07:22 access kernel: Kernel NEXTQSCB = 10 > > Nov 22 04:07:22 access kernel: Card NEXTQSCB = 10 > > Nov 22 04:07:22 access kernel: QINFIFO entries: > > Nov 22 04:07:22 access kernel: Waiting Queue entries: > > Nov 22 04:07:22 access kernel: Disconnected Queue entries: > > Nov 22 04:07:22 access kernel: QOUTFIFO entries: > > Nov 22 04:07:22 access kernel: Sequencer Free SCB List: 14 > > 3 5 0 10 15 13 2 4 9 12 7 11 8 6 1 > > Nov 22 04:07:22 access kernel: Sequencer SCB Info: 0(c > > 0x60, s 0x7, l 0, t 0xff) 1(c 0x60, s 0x7, l 0, t 0xff) > > 2(c 0x60, s 0x7, l 0, t 0xff) 3(c 0x60, s 0x7, l 0, t > > 0xff) 4(c 0x60, s 0x7, l 0, t 0xff) 5(c 0x60, s 0x7, l 0, > > t 0xff) 6(c 0x60, s 0x37, l 0, t 0xff) 7(c 0x60, s 0x7, l > > 0, t 0xff) 8(c 0x60, s 0x7, l 0, t 0xff) 9(c 0x60, s 0x7, > > l 0, t 0xff) 10(c 0x60, s 0x7, l 0, t 0xff) 11(c 0x60, s > > 0x7, l 0, t 0xff) 12(c 0x60, s 0x7, l 0, t 0xff) 13(c > > 0x60, s 0x7, l 0, t 0xff) 14(c 0x60, s 0x37, l 0, t 0xff) > > 15(c 0x60, s 0x7, l 0, t 0xff) > > Nov 22 04:07:22 access kernel: Pending list: 208(c 0x60, s > > 0x7, l 0) > > Nov 22 04:07:22 access kernel: Kernel Free SCB list: 14 4 > > 214 11 103 226 105 204 38 235 47 49 1 28 113 246 238 61 > > 249 115 110 26 6 220 93 229 209 65 102 233 25 50 29 43 237 > > 52 5 67 250 13 3 234 76 64 17 42 36 71 223 112 245 114 111 > > 240 55 218 228 24 63 95 69 225 107 34 16 215 213 54 46 31 > > 232 66 230 252 244 79 119 21 37 239 40 109 75 247 227 98 > > 41 88 108 222 51 231 248 9 216 117 212 219 32 253 19 118 > > 83 104 217 106 73 2 221 211 92 251 241 15 243 23 0 53 96 > > 224 35 236 33 242 81 58 116 123 122 121 120 127 126 125 > > 124 131 130 129 128 135 134 133 132 139 138 137 136 143 > > 142 141 140 147 146 145 144 151 150 149 148 155 154 153 > > 152 159 158 157 156 163 162 161 160 167 166 165 164 171 > > 170 169 168 175 174 173 172 179 178 177 176 183 182 181 > > 180 187 186 185 184 191 190 189 188 195 194 193 192 199 > > 198 197 196 203 202 201 200 207 206 205 100 57 56 60 27 68 > > 70 72 80 12 89 94 99 97 101 18 44 7 210 45 48 62 8 22 20 > > 74 78 82 59 77 85 87 86 39 84 30 91 90 > > Nov 22 04:07:22 access kernel: DevQ(0:0:0): 0 waiting > > Nov 22 04:07:22 access kernel: DevQ(0:1:0): 0 waiting > > Nov 22 04:07:22 access kernel: DevQ(0:2:0): 0 waiting > > Nov 22 04:07:22 access kernel: DevQ(0:3:0): 0 waiting > > Nov 22 04:07:22 access kernel: DevQ(0:6:0): 0 waiting > > Nov 22 04:07:22 access kernel: (scsi0:A:0:0): Queuing a > > recovery SCB > > Nov 22 04:07:22 access kernel: scsi0:0:0:0: Device is > > disconnected, re-queuing SCB > > Nov 22 04:07:22 access kernel: Recovery code sleeping > > Nov 22 04:07:22 access kernel: (scsi0:A:0:0): Abort Tag > > Message Sent > > Nov 22 04:07:22 access kernel: (scsi0:A:0:0): SCB 208 - > > Abort Tag Completed. > > Nov 22 04:07:22 access kernel: Recovery SCB completes > > Nov 22 04:07:22 access kernel: Recovery code awake > > Nov 22 04:07:22 access kernel: aic7xxx_abort returns > > 0x2002 > > > > This is also the Last message in Syslog. Is this an indication that the > > SCSI Controller is going bad? > > ------------------------------------------------------------------------ > Want to buy your Pack or Services from MandrakeSoft? > Go to http://www.mandrakestore.com -- Albert E. Whale - CISSP http://www.abs-comptech.com ---------------------------------------------------------------------- ABS Computer Technology, Inc. - ESM, Computer & Networking Specialists Sr. Security, Network, and Systems Consultant Board of Directors - InfraGard - Pittsburgh, PA
Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com
