On Tue, 2010-11-30 at 16:27 +1000, Andrew Pollock wrote:
> Hi,
>
> I've just attempted to consolidate two computers into one, putting a Silicon
> Image SiI 3124 4 port eSATA controller (1095:3124) into my existing MythTV
> backend (a Dell Dimension 3100), which has a Hauppauge PVR-350 (4444:0803)
> in it.
>
> The eSATA controller has 4 SATA drives attached to it, in a RAID-10
> configuration, with LVM on top (hosting a JFS filesystem in a logical volume
> if it makes a difference).
>
> The problem I have is that the kernel has been panicking left and right
> since I did this.
>
> After a lot of swearing, and trial and error, I've narrowed down the crux of
> the problem:
>
> If I load the IVTV firmware, and am doing any sort of serious I/O on any of
> the eSATA-connected drives, the kernel will panic (in what seems to be weird
> and wonderful ways).
>
> The ways I've been synthesizing I/O have been to just dd one of the disks to
> /dev/null, or to get mdadm to do a RAID check (echo check >
> /sys/block/md0/md/sync_action)
>
> The way I've been triggering a firmware load is to just cat /dev/video0 out
> to /dev/null
>
> I can be exercising the disks happily, up until I trigger the firmware to
> load, and then BOOM, it panics.
>
> I've tried totally unloading the sata_sil24 driver, loading the firmware,
> and the reloading sata_sil24, with the same results, so I don't believe it's
> an ordering thing(yes, I've seen
> http://ivtvdriver.org/index.php/Firmware#warning:_SCSI_devicename_collisions,
> and I'm not sure this is still relevant, but I figure that unloading
> sata_sil24 and loading the firmware, then loading sata_sil24 again should do
> the trick). I've also tried completely unloading both ivtv and sata_sil24,
> and reloading sata_sil24 first, then ivtv and the firmware. Same deal.
>
> I've got some partial camera photos of some of the panics at
> http://www.andrew.net.au/~apollock/ivtv+sii=oops/ and if it'll help at all,
> I can do the requisite hoop jumping to try and get a higher-resolution
> display attached and hopefully capture the entire back trace.
>
> I've experienced this with 2.6.26 and 2.6.32, and a home-rolled 2.6.36 (I
> tried to use kexec/kdump to get a dump, but the kdump kernel panicked for
> some reason).
When a system panics it will not write to the file systems.
UNIX systems do this because generally, when one is panicking, the worst
thing to do would be to try and fix something. ;)
> I've also tried disabling hyperthreading (by shutting down the second CPU).
>
> Any thoughts, or assistance would be greatly appreciated.
The screenshots indicate that this is a VFS/Ext3 problem.
ext3_show_options() appears to just build a list of non-default mount
options that are in effect by looking at the filesystem's superblock for
the defaults.
A firmware load for ivtv will prompt udev read some scripts
under /usr/share, twiddle a "loading" node under /sys, and the copy the
firmware files from /lib/firmware into a "firmware" node under /sys.
I'm not sure how that could be the problem.
I'm assuming you have either:
1. (static) corruption in your ext3 filesystems on disk. Just because
the journal says things are clean, doesn't mean it can't be there.
2. (dynamic) corruption caused by a bug in the sata_sil24 driver and/or
it combination with the entire VFS driver stack you use: ext3, dm, lvm,
raid, etc.
3. (dynamic) corruption because the sata_sil24 hardware and the ivtv
hardware are sharing the same interrupt line and the sata_sil24 driver
doesn't handle that properly. (That's conjecture, I have not checked
the sata_sil24 driver.)
Some things to try:
a. See if the ivtv card and the sata_sil24 card share an IRQ line
cat /proc/interrupts
b. See if simply invoking ext3_show_options() with lots of disk activity
is enough to invoke the problem.
(generate lots of disk activity)
cat /proc/mounts
(I think that will do it.)
c. Force an fsck on you filesystems.
Backup any data you care about, knowing that it may be corrupt
Temporarily add "forcefsck" to your kernel commandline on reboot
d. E-mail the relevant mailing list for ext2/3/4 file system
development. They'll have a much better idea of what's causing your
VFS/Ext3 oops's than I will.
e. Transcribe the 64 "Code:" bytes from your dumps (Registers, and the
EIP and CR2 contents would be nice too) into an e-mail.
I can put the code bytes back into a small binary with xxd, and then run
objdump to get a disassembly to see exactly where in ext3_show_options()
the oops is occurring. That may help somewhat narrow the VFS/Ext3
problem, but that alone won't fix the system level problem you have.
Regards,
Andy
_______________________________________________
ivtv-users mailing list
[email protected]
http://ivtvdriver.org/mailman/listinfo/ivtv-users