On 28/05/2013 10:39, Udo Grabowski (IMK) wrote:
Hello,we have problems with Oracle, they don't want to serve our service contract because we have installed illumos/OI and they attribute the errors we see (on a Sun X4275) not to a hardware defect, but a software problem with illumos FMA (and the SP does indeed not report a problem, newest firmware installed). Is there indeed a problem with FMA in illumos with this machine type, or do we have a real hardware problem that is not properly diagnosed ? The impact is rather bad, it disables all hardware on PCI it can disable on boot (like network cards) because it thinks the PCI is botched.
It seems this is indeed a bug in FMA, compare the oracle report for it (bug 15820471):A device reporting a large number of correctable errors continutes to report errors while it is being disabled. Each error is queued and diagnosed in order. In rare cases, the last error reported by the device might be queued but not processed for diagnosis until after the device is disabled. This last error is reported as unexpected telemetry, because the device is no longer enabled in the system.
<http://docs.oracle.com/cd/E28853_01/html/E28858/z4001b331394613.html> Does anybody have an idea where that is to be fixed ?
# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
May 27 19:04:47 e35fa799-9ea4-c042-a039-cef64ced402f SUNOS-8000-J0 Major
Host : imksung01
Platform : SUN-FIRE-X4275-SERVER Chassis_id : 0943XFG026
Product_sn :
Fault class : defect.sunos.eft.unexpected_telemetry 50%
fault.sunos.eft.unexpected_telemetry 50%
Problem in : dev:////pci@0,0
faulted and taken out of service
Description : The diagnosis engine encountered telemetry from the listed
devices for which it was unable to perform a diagnosis -
Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
information. Refer to http://illumos.org/msg/SUNOS-8000-J0 for
more information.
Response : Error reports have been logged for examination by Sun.
Impact : Automated diagnosis and response for these events will not occur.
Action : Ensure that the latest Solaris Kernel and Predictive Self-Healing
(PSH) patches are installed.
# fmdump -eV
May 27 2013 22:15:43.423088158 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0xc05896c893b01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,340c@5
(end detector)
bdf = 0x28
device_id = 0x340c
vendor_id = 0x8086
rev_id = 0x13
dev_type = 0x40
pcie_off = 0x90
pcix_off = 0x0
aer_off = 0x100
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x0
pcie_command = 0x26
pcie_dev_cap = 0x8021
pcie_adv_ctl = 0x0
pcie_ue_status = 0x0
pcie_ue_mask = 0x100000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
pcie_rp_status = 0x0
pcie_rp_control = 0x0
pcie_adv_rp_status = 0x0
pcie_adv_rp_command = 0x7
pcie_adv_rp_ce_src_id = 0x0
pcie_adv_rp_ue_src_id = 0x0
remainder = 0x1
severity = 0x1
__ttl = 0x1
__tod = 0x51a3beef 0x1937d01e
May 27 2013 22:15:43.423089201 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0xc05896c893b01001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,340c@5/pci108e,286@0
(end detector)
bdf = 0x1300
device_id = 0x285
vendor_id = 0x9005
rev_id = 0x9
dev_type = 0x0
pcie_off = 0xd0
pcix_off = 0x0
aer_off = 0x100
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pcie_status = 0x0
pcie_command = 0x2036
pcie_dev_cap = 0x6481c2
pcie_adv_ctl = 0xb4
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62011
pcie_ue_hdr0 = 0x5000001
pcie_ue_hdr1 = 0x280003
pcie_ue_hdr2 = 0x14000000
pcie_ue_hdr3 = 0x14000000
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
remainder = 0x0
severity = 0x1
__ttl = 0x1
__tod = 0x51a3beef 0x1937d431
May 28 2013 04:41:39.246379273 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x114c70346f801001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,340c@5
(end detector)
bdf = 0x28
device_id = 0x340c
vendor_id = 0x8086
rev_id = 0x13
dev_type = 0x40
pcie_off = 0x90
pcix_off = 0x0
aer_off = 0x100
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pci_bdg_sec_status = 0x0
pci_bdg_ctrl = 0x3
pcie_status = 0x0
pcie_command = 0x26
pcie_dev_cap = 0x8021
pcie_adv_ctl = 0x0
pcie_ue_status = 0x0
pcie_ue_mask = 0x100000
pcie_ue_sev = 0x62030
pcie_ue_hdr0 = 0x0
pcie_ue_hdr1 = 0x0
pcie_ue_hdr2 = 0x0
pcie_ue_hdr3 = 0x0
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
pcie_rp_status = 0x0
pcie_rp_control = 0x0
pcie_adv_rp_status = 0x0
pcie_adv_rp_command = 0x7
pcie_adv_rp_ce_src_id = 0x0
pcie_adv_rp_ue_src_id = 0x0
remainder = 0x1
severity = 0x1
__ttl = 0x1
__tod = 0x51a41963 0xeaf7309
May 28 2013 04:41:39.246381160 ereport.io.pci.fabric
nvlist version: 0
class = ereport.io.pci.fabric
ena = 0x114c70346f801001
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@0,0/pci8086,340c@5/pci108e,286@0
(end detector)
bdf = 0x1300
device_id = 0x285
vendor_id = 0x9005
rev_id = 0x9
dev_type = 0x0
pcie_off = 0xd0
pcix_off = 0x0
aer_off = 0x100
ecc_ver = 0x0
pci_status = 0x10
pci_command = 0x47
pcie_status = 0x0
pcie_command = 0x2036
pcie_dev_cap = 0x6481c2
pcie_adv_ctl = 0xb4
pcie_ue_status = 0x0
pcie_ue_mask = 0x180000
pcie_ue_sev = 0x62011
pcie_ue_hdr0 = 0x5000001
pcie_ue_hdr1 = 0x280003
pcie_ue_hdr2 = 0x14000000
pcie_ue_hdr3 = 0x14000000
pcie_ce_status = 0x0
pcie_ce_mask = 0x0
remainder = 0x0
severity = 0x1
__ttl = 0x1
__tod = 0x51a41963 0xeaf7a68
-- Dr.Udo Grabowski Inst.f.Meteorology a.Climate Research IMK-ASF-SAT www.imk-asf.kit.edu/english/sat.php KIT - Karlsruhe Institute of Technology http://www.kit.edu Postfach 3640,76021 Karlsruhe,Germany T:(+49)721 608-26026 F:-926026 ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
smime.p7s
Description: S/MIME Cryptographic Signature
