What does fmdump show?  Dumping the events will tell the nature of the fabric 
error.

Michael

On Feb 7, 2013, at 8:49 PM, Kent Watsen <[email protected]> wrote:

> 
> Bottom line:  I think this is a hardware issue with a new mobo I'm using.  
> I'm hoping for confirmation on that before submitting the mobo for an RMA.
> 
> Background:  I recently decided to upgrade my SAN by:
>  replacing the H8DMI-2 mobo with latest rev of the H8DME-2 mobo
>  replacing the 4-core Opterons with 6-core Opterons
>  replacing OpenIndiana with OmniOS
> Everything else is the same.  Specifically, I'm reusing three AOC-SAT2-MV8 
> PCI-X cards to provide 24 sata ports.  These cards map to drives as follows:
> 
>   AOC-SAT2-MV8 card1: c3t[0-7]d0
>   AOC-SAT2-MV8 card2: c4t[0-7]d0
>   AOC-SAT2-MV8 card3: c5t[0-7]d0
> 
> And I create a zpool as follows:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz2-0  ONLINE       0     0     0
>             c3t0d0  ONLINE       0     0     0
>             c3t4d0  ONLINE       0     0     0
>             c4t0d0  ONLINE       0     0     0
>             c4t4d0  ONLINE       0     0     0
>             c5t0d0  ONLINE       0     0     0
>             c5t4d0  ONLINE       0     0     0
>           raidz2-1  ONLINE       0     0     0
>             c3t1d0  ONLINE       0     0     0
>             c3t5d0  ONLINE       0     0     0
>             c4t1d0  ONLINE       0     0     0
>             c4t5d0  ONLINE       0     0     0
>             c5t1d0  ONLINE       0     0     0
>             c5t5d0  ONLINE       0     0     0
> 
> What triggers the kernel panic:  (panic happens immediately)
>     root@san:~# cp one-gig-file.dat /tank/
> 
> 
> 
> ##### BEGIN KERNEL PANIC #####
> SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
> EVENT-TIME: 0x51146c6a.0x6ecf192 (0x2bc40a83a3)
> PLATFORM: i86pc, CSN: -, HOSTNAME: san
> SOURCE: SunOS, REV: 5.11 omnios-33fdde4
> DESC: Errors have been detected that require a reboot to ensure system
> integrity.  See http://illumos.org/msg/SUNOS-8000-0G for more information.
> AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
> IMPACT: The system will sync files, save a crash dump if needed, and reboot
> REC-ACTION: Save the error summary below in case telemetry cannot be saved
> 
> 
> panic[cpu0]/thread=ffffff000f4cbc40: pcieb-0: PCI(-X) Express Fatal Error. 
> (0x45)
> 
> ffffff000f4cbb70 pcieb:pcieb_intr_handler+1c9 ()
> ffffff000f4cbbe0 unix:av_dispatch_autovect+95 ()
> ffffff000f4cbc20 unix:dispatch_hardint+36 ()
> ffffff000f405a60 unix:switch_sp_and_call+13 ()
> ffffff000f405ac0 unix:do_interrupt+a8 ()
> ffffff000f405ad0 unix:cmnint+ba ()
> ffffff000f405bc0 unix:mach_cpu_idle+6 ()
> ffffff000f405bf0 unix:cpu_idle+11a ()
> ffffff000f405c00 unix:cpu_idle_adaptive+13 ()
> ffffff000f405c20 unix:idle+a7 ()
> ffffff000f405c30 unix:thread_start+8 ()
> 
> syncing file systems... done
> ereport.io.pci.fabric ena=2bc408a03a00001 detector=[ version=0 scheme="dev"
>  device-path="/pci@0,0/pci10de,376@a" ] bdf=50 device_id=376 vendor_id=10de
>  rev_id=a3 dev_type=40 pcie_off=80 pcix_off=0 aer_off=160 ecc_ver=0 
> pci_status=
>  10 pci_command=47 pci_bdg_sec_status=6000 pci_bdg_ctrl=3 pcie_status=0
>  pcie_command=2037 pcie_dev_cap=8001 pcie_adv_ctl=a0     pcie_ue_status=0
>  pcie_ue_mask=180000 pcie_ue_sev=62011 pcie_ue_hdr0=0 pcie_ue_hdr1=0
>  pcie_ue_hdr2=0 pcie_ue_hdr3=0 pcie_ce_status=0 pcie_ce_mask=0 
> pcie_rp_status=0
>  pcie_rp_control=0 pcie_adv_rp_status=800007c pcie_adv_rp_command=7
>  pcie_adv_rp_ce_src_id=0 pcie_adv_rp_ue_src_id=201 remainder=5 severity=1
> 
> ereport.io.pci.fabric ena=2bc409115e00001 detector=[ version=0 scheme="dev"
>  device-path="/pci@0,0/pci10de,376@a/pci1033,125@0" ] bdf=200 device_id=125
>  vendor_id=1033 rev_id=8 dev_type=70 pcie_off=40 pcix_off=54     aer_off=100
>  ecc_ver=0 pci_status=10 pci_command=47 pci_bdg_sec_status=2420 pci_bdg_ctrl=7
>  pcix_bdg_status=200 pcix_bdg_sec_status=83 pcie_status=20 pcie_command=2027
>  pcie_dev_cap=1 pcie_adv_ctl=a0 pcie_ue_status=0 pcie_ue_mask=180000
>  pcie_ue_sev=62010 pcie_ue_hdr0=4000001 pcie_ue_hdr1=50000f pcie_ue_hdr2=
>  2020000 pcie_ue_hdr3=0 pcie_ce_status=0 pcie_ce_mask=0 pcie_sue_adv_ctl=0
>  pcie_sue_status=0 pcie_sue_mask=8 pcie_sue_sev=1340 pcie_sue_hdr0=20003
>  pcie_sue_hdr1=a0 pcie_sue_hdr2=10000 pcie_sue_hdr3=0 remainder=4 severity=5
> 
> ereport.io.pci.fabric ena=2bc4096d5c00001 detector=[ version=0 scheme="dev"
>  device-path="/pci@0,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@4" ] bdf=320
>  device_id=6081 vendor_id=11ab rev_id=9 dev_type=101 pcie_off=0 pcix_off=60
>  aer_off=0 ecc_ver=0 pci_status=2b0 pci_command=157 pcix_status=1830320
>  pcix_command=30 remainder=3 severity=1
> 
> ereport.io.pci.fabric ena=2bc4098c0500001 detector=[ version=0 scheme="dev"
>  device-path="/pci@0,0/pci10de,376@a/pci1033,125@0/pci11ab,11ab@6" ] bdf=330
>  device_id=6081 vendor_id=11ab rev_id=9 dev_type=101 pcie_off=0 pcix_off=60
>  aer_off=0 ecc_ver=0 pci_status=2b0 pci_command=157 pcix_status=1830330
>  pcix_command=30 remainder=2 severity=1
> 
> ereport.io.pci.fabric ena=2bc409a74200001 detector=[ version=0 scheme="dev"
>  device-path="/pci@0,0/pci10de,376@a/pci1033,125@0,1" ] bdf=201 device_id=125
>  vendor_id=1033 rev_id=8 dev_type=70 pcie_off=40 pcix_off=54     aer_off=100
>  ecc_ver=0 pci_status=10 pci_command=47 pci_bdg_sec_status=6420 pci_bdg_ctrl=7
>  pcix_bdg_status=201 pcix_bdg_sec_status=c3 pcie_status=6 pcie_command=2027
>  pcie_dev_cap=1 pcie_adv_ctl=a0 pcie_ue_status=0 pcie_ue_mask=180000
>  pcie_ue_sev=62010 pcie_ue_hdr0=0 pcie_ue_hdr1=0 pcie_ue_hdr2=0 pcie_ue_hdr3=0
>  pcie_ce_status=0 pcie_ce_mask=0 pcie_sue_adv_ctl=c pcie_sue_status=1800
>  pcie_sue_mask=8 pcie_sue_sev=1340 pcie_sue_hdr0=20104 pcie_sue_hdr1=a0
>  pcie_sue_hdr2=10000 pcie_sue_hdr3=0 pcie_sue_tgt_trans=0 pcie_sue_tgt_addr=0
>  pcie_sue_tgt_bdf=ffff remainder=1 severity=45
> 
> ereport.io.pci.fabric ena=2bc40a0e5600001 detector=[ version=0 scheme="dev"
>  device-path="/pci@0,0/pci10de,376@a/pci1033,125@0,1/pci11ab,11ab@6" ] bdf=430
>  device_id=6081 vendor_id=11ab rev_id=9 dev_type=101 pcie_off=0 pcix_off=60
>  aer_off=0 ecc_ver=0 pci_status=c3b8 pci_command=157 pcix_status=1830430
>  pcix_command=30 remainder=0 severity=40
> 
> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> ##### END KERNEL PANIC #####
> 
> Thoughts?  - is it clearly the mobo?  - could it possibly be the [new] cpus?  
> - anything else to try?
> 
> Thanks.
> Kent
> 
> 
> illumos-discuss | Archives  | Modify Your Subscription         



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to