Dear Mr. Long, thanks a lot for taking the time to respond - especially given that you're on vaccations and that it's almost 2 a.m. your time.
I apologize for having used vague formulations in my past mail. Also, perhaps I have made up wrong meanings for some vocabulary occuring in the driver's code. Specifically: > 2. What is a zero-padded FIB? I concede that the AIF handling in the driver > is sub-par and needs to be revisited, so I'd like to know what you are seeing. > I was referring to this: aac_dequeue_fib: called aac0: aac_host_command: FIB @ 0xe1984000 aac0: XferState 0 aac0: Command 0 aac0: StructType 0 aac0: Flags 0x0 aac0: Size 0 aac0: SenderSize 0 aac0: SenderAddress 0x0 aac0: RcvrAddress 0x0 aac0: SenderData 0x0 aac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 aac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 aac0: unknown command from controller The size is a zero, the data dump at the end contains all zeroes. That's why I called it a "zero-padded FIB". This only occurs when an "unhandled array failure" arrives - when the machine is about to hang upon runtime array degradation or at boot from a degraded array. Which normally only happens with SMP+APIC_IO enabled. Not with a UP kernel. All the other FIB listings that I've seen contain some non-zero data and claim non-zero length... In my last message, I have attached a tarball with some logs. To see what I'm talking about, please take a look at this: - runtime array degradation - compare the two logs: - SMP, unrecoverable failure: logs/DEBUG_CAM_AAC_L2/SMP-2_disk_failed (line 23) - UP, system keeps going just fine: logs/DEBUG_CAM_AAC_L2/NOSMP-2_disk_failed (line 76) - boot from a degraded array - compare the two logs: - SMP, unrecoverable failure logs/DEBUG_AAC_L4/SMP-3_boot_with_degraded_array_failed (line 296) - UP, system boots just fine: logs/DEBUG_AAC_L4/NOSMP-3_boot_with_degraded_array_OK (line 273) > 3. The split and corrupted messages on the console were likely due to > kernel printfs happening from different contexts at the same time. The > printf facility has no serializing ability, unfortunately. > OK. > 4. I'm unclear on what you mean by there being a problem in the > asynchronous handling of device printfs and host command fibs. I'd be > very interested in more information on this. > I didn't mean to say that there was the cause of my problem in that area. I meant to say that I have a problem understanding what's going on. I'm not a skilled coder, I have a hard time understanding how the driver's code works. I am able to see where a function is called with some arguments and returns with a result. However, when a SCSI command is issued to the controller, at the driver level the request/response doesn't happen within a single function. One function queues the command to the controller via the MMIO (?) region and the response from the controller eventually comes back within an interrupt, invoking an interrupt handler. The response may be a valid SCSI response to the SCSI command, or a "something went wrong" **Monitor** event. I am vaguely aware that the SCSI controller can reorder commands in the queue or process them out of order. Combine this with the unserialized logging and I'm lost :) Sorry. If there's something specific I should check for, please let me know. Thanks for being patient with my hasty descriptions :) Frank Rysanek _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"