hello. The information that a warm reset doesn't come up clean is
useful information.
In looking at mfii.c, it looks like there are two possible sources of the
problem. The first
is the one I've mentioned earlier, that somehow, interrupt handling gets
mangled during
operations and interrupt stop getting received from the Perc controller.
The second is the Perc controller itself is getting into a weird state causing
its firmware to
stop completing requests.
I'm not sure which source to look at first, so here are some suggestions.
1. Before the problem occurs, can you capture some dmesg output showing how
the mfii devices
attach and what interupts they're using?
2. What does the output of vmstat -i look like when things are working?
3. Have yu brought up the Perc's RAID configuration menu to confirm the raid
sets are healthy
and that you're not getting any disk errors which might be masked from NetBSD
itself? I've
seen this sort of behavior when a disk is throwing errors; the Perc firmware is
so busy dealing
with the problem disk it stops responding to the mfii(4) driver.
Unfortunately, the NetBSD
driver isn't very good about reporting these kinds of errors; I'm not sure if
it's a problem
with the mfii(4) driver or the firmware on the Perc itself.
Because the errors happen at random intervals after the machine boots, it's
possible the issue
is a good old fashioned failing disk.
I do realize yu see the errors on two separate controllers, which is
why I'm leaning
toward an interrupt issue, but it would be good to verify your disks are good.
Hope that helps.
-Brian