Bottom line:  I think this is a hardware issue with a new mobo I'm using.  I'm hoping for confirmation on that before submitting the mobo for an RMA.

Background:  I recently decided to upgrade my SAN by:
  •  replacing the H8DMI-2 mobo with latest rev of the H8DME-2 mobo
  •  replacing the 4-core Opterons with 6-core Opterons
  •  replacing OpenIndiana with OmniOS
Everything else is the same.  Specifically, I'm reusing three AOC-SAT2-MV8 PCI-X cards to provide 24 sata ports.  These cards map to drives as follows:

  AOC-SAT2-MV8 card1: c3t[0-7]d0
  AOC-SAT2-MV8 card2: c4t[0-7]d0
  AOC-SAT2-MV8 card3: c5t[0-7]d0

And I create a zpool as follows:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0

What triggers the kernel panic:  (panic happens immediately)
    root@san:~# cp one-gig-file.dat /tank/



##### BEGIN KERNEL PANIC #####
SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x51146c6a.0x6ecf192 (0x2bc40a83a3)
PLATFORM: i86pc, CSN: -, HOSTNAME: san
SOURCE: SunOS, REV: 5.11 omnios-33fdde4
DESC: Errors have been detected that require a reboot to ensure system
integrity.  See http://illumos.org/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved


panic[cpu0]/thread=ffffff000f4cbc40: pcieb-0: PCI(-X) Express Fatal Error. (0x45)

ffffff000f4cbb70 pcieb:pcieb_intr_handler+1c9 ()
ffffff000f4cbbe0 unix:av_dispatch_autovect+95 ()
ffffff000f4cbc20 unix:dispatch_hardint+36 ()
ffffff000f405a60 unix:switch_sp_and_call+13 ()
ffffff000f405ac0 unix:do_interrupt+a8 ()
ffffff000f405ad0 unix:cmnint+ba ()
ffffff000f405bc0 unix:mach_cpu_idle+6 ()
ffffff000f405bf0 unix:cpu_idle+11a ()
ffffff000f405c00 unix:cpu_idle_adaptive+13 ()
ffffff000f405c20 unix:idle+a7 ()
ffffff000f405c30 unix:thread_start+8 ()

syncing file systems... done
ereport.io.pci.fabric ena=2bc408a03a00001 detector=[ version=0 scheme="dev"
 device-path="/pci@0,0/pci10de,376@a" ] bdf=50 device_id=376 vendor_id=10de
 rev_id=a3 dev_type=40 pcie_off=80 pcix_off=0 aer_off=160 ecc_ver=0 pci_status=
 10 pci_command=47 pci_bdg_sec_status=6000 pci_bdg_ctrl=3 pcie_status=0
 pcie_command=2037 pcie_dev_cap=8001 pcie_adv_ctl=a0 pcie_ue_status=0
 pcie_ue_mask=180000 pcie_ue_sev=62011 pcie_ue_hdr0=0 pcie_ue_hdr1=0
 pcie_ue_hdr2=0 pcie_ue_hdr3=0 pcie_ce_status=0 pcie_ce_mask=0 pcie_rp_status=0
 pcie_rp_control=0 pcie_adv_rp_status=800007c pcie_adv_rp_command=7
 pcie_adv_rp_ce_src_id=0 pcie_adv_rp_ue_src_id=201 remainder=5 severity=1

ereport.io.pci.fabric ena=2bc409115e00001 detector=[ version=0 scheme="dev"
 device-path="/pci@0,0/pci10de,376@a/pci1033,125@0" ] bdf=200 device_id=125
 vendor_id=1033 rev_id=8 dev_type=70 pcie_off=40 pcix_off=54 aer_off=100
 ecc_ver=0 pci_status=10 pci_command=47 pci_bdg_sec_status=2420 pci_bdg_ctrl=7
 pcix_bdg_status=200 pcix_bdg_sec_status=83 pcie_status=20 pcie_command=2027
 pcie_dev_cap=1 pcie_adv_ctl=a0 pcie_ue_status=0 pcie_ue_mask=180000
 pcie_ue_sev=62010 pcie_ue_hdr0=4000001 pcie_ue_hdr1=50000f pcie_ue_hdr2=
 2020000 pcie_ue_hdr3=0 pcie_ce_status=0 pcie_ce_mask=0 pcie_sue_adv_ctl=0
 pcie_sue_status=0 pcie_sue_mask=8 pcie_sue_sev=1340 pcie_sue_hdr0=20003
 pcie_sue_hdr1=a0 pcie_sue_hdr2=10000 pcie_sue_hdr3=0 remainder=4 severity=5

ereport.io.pci.fabric ena=2bc4096d5c00001 detector=[ version=0 scheme="dev"
 device-path="/pci@0,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4" ] bdf=320
 device_id=6081 vendor_id=11ab rev_id=9 dev_type=101 pcie_off=0 pcix_off=60
 aer_off=0 ecc_ver=0 pci_status=2b0 pci_command=157 pcix_status=1830320
 pcix_command=30 remainder=3 severity=1

ereport.io.pci.fabric ena=2bc4098c0500001 detector=[ version=0 scheme="dev"
 device-path="/pci@0,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@6" ] bdf=330
 device_id=6081 vendor_id=11ab rev_id=9 dev_type=101 pcie_off=0 pcix_off=60
 aer_off=0 ecc_ver=0 pci_status=2b0 pci_command=157 pcix_status=1830330
 pcix_command=30 remainder=2 severity=1

ereport.io.pci.fabric ena=2bc409a74200001 detector=[ version=0 scheme="dev"
 device-path="/pci@0,0/pci10de,376@a/pci1033,125@0,1" ] bdf=201 device_id=125
 vendor_id=1033 rev_id=8 dev_type=70 pcie_off=40 pcix_off=54 aer_off=100
 ecc_ver=0 pci_status=10 pci_command=47 pci_bdg_sec_status=6420 pci_bdg_ctrl=7
 pcix_bdg_status=201 pcix_bdg_sec_status=c3 pcie_status=6 pcie_command=2027
 pcie_dev_cap=1 pcie_adv_ctl=a0 pcie_ue_status=0 pcie_ue_mask=180000
 pcie_ue_sev=62010 pcie_ue_hdr0=0 pcie_ue_hdr1=0 pcie_ue_hdr2=0 pcie_ue_hdr3=0
 pcie_ce_status=0 pcie_ce_mask=0 pcie_sue_adv_ctl=c pcie_sue_status=1800
 pcie_sue_mask=8 pcie_sue_sev=1340 pcie_sue_hdr0=20104 pcie_sue_hdr1=a0
 pcie_sue_hdr2=10000 pcie_sue_hdr3=0 pcie_sue_tgt_trans=0 pcie_sue_tgt_addr=0
 pcie_sue_tgt_bdf=ffff remainder=1 severity=45

ereport.io.pci.fabric ena=2bc40a0e5600001 detector=[ version=0 scheme="dev"
 device-path="/pci@0,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6" ] bdf=430
 device_id=6081 vendor_id=11ab rev_id=9 dev_type=101 pcie_off=0 pcix_off=60
 aer_off=0 ecc_ver=0 pci_status=c3b8 pci_command=157 pcix_status=1830430
 pcix_command=30 remainder=0 severity=40

dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
##### END KERNEL PANIC #####

Thoughts?  - is it clearly the mobo?  - could it possibly be the [new] cpus?  - anything else to try?

Thanks.
Kent


illumos-discuss | Archives | Modify Your Subscription

Reply via email to