I have further characterized the error.  It looks like, at least
during the softraid rebuild process, most DMA commands are sent to the
PCI card and then complete via an IRQ callback before the next command
is sent.  However, the problem I see here sometimes occurrs when:

- Command for drive 1 is sent to the PCI card via DMA
(sata_promise.c:pdc_packet_start)
- Command for drive 2 is sent to the PCI card via DMA before the
previous command completes
- Command for drive 1 completes (sata_promise.c:pdc_host_intr)

Often the command for drive 2 will now timeout.

Now, I have seen the case when this above scenario will actually
complete successfully, either with a second IRQ just for the drive2
command, or sometimes with a single IRQ which completes both commands.

I have a workaround using a semaphore which causes all commands to
strictly serialize, (lock it in pdc_packet_start, unlock in
pdc_host_intr) thereby not allowing any concurrent commands, but this
appears to have a large performance impact.  At least it allows me to
actually cause my softraid device to finish syncing to 100%.

I'm looking for other solutions, or a clue as to the actual cause of
the error.  My current theory is that if the second command is sent to
the PCI via DMA too soon, it may be overlooked, so some rate-limiting
may be useful, if I can figure out how to implement it.

Any comments or suggestions here would be greatly appreciated, thanks!

-- 
Jim Ramsay
"Me fail English?  That's unpossible!"
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to