Hi Lisa and Tarik, note that Solaris does not add Nehalem specific support until 10u7. Lynx/Virgo RQT is running 10u6 which is what we will RR with.
mb > On 02/17/09 11:01, Lisa Curhan wrote: > >> Hi Tarik- >> >> Do you agree with Arrow's assesment that this fault is due to a Solaris >> FMA code problem rather than a hardware problem? >> > > Its hard to say without some more information. > There could be both a HW issue and SW issue. > > First of all we know that we're getting an "ereport.cpu.intel.unknown". > I'm guessing that probably wouldn't happen unless there was some sort of > CPU error > to start with. I'm not sure where that ereport gets generated or why its > "unknown". > Gavin Maltby and Adrian Frost are your best bet since they worked on the > intel > cpu/mem FMA code. > > It looks to me like these ereports should be discarded in the rules. > prop upset.disc...@chip/cpu (0)-> > ereport.cpu.intel.exter...@chip/cpu, > ereport.cpu.intel.unkn...@chip/cpu; > > However you got an "undiagnosable" defect with a reason of > "ereport zero not found in instance tree". This is most likely > due to a problem with: > - the DE rules > - the topology > - the ereport content (detector fmri) > > Can we get a `fmdump -eV` and a `fmtopo -pV` output from the system? > This would help us figure out why the diagnosis failed. > > >> The e-report points to >> an intel CPU, but the SunVTS tests running when this fault occurred did >> not fail. >> >> > > SunVTS will not necessarily fail unless its checking for ereports or faults. > > -tarik > >> If you or one of the other Solaris FMA experts could get back to us >> today with some advice it would be much appreciated. This failure mode >> is occurring both in system test (SAT) and during the RQT testing and >> Vayu is at a critical point in time where we need to make progress as >> quickly as possible. I believe we are using X64 S10u6 in this testing, >> but I'm not sure which patches we have. >> >> Lisa >> --------------------------------------------------- >> On 02/17/09 01:45, Arrow.luo wrote: >> >> >>> Hi Dauker, >>> >>> This failure should be not relative with FMD device, this is the minor >>> FMA failure which may cause by incorrect FMA software. Attached files >>> are the detail explanation for this failure. >>> >>> *Suggested Action for System Administrator* >>> >>> Run pkgchk -n SUNWfmd to ensure that fault management software is >>> installed properly. Contact Sun for support. >>> >>> *Details* >>> >>> The Message ID: *SUNOS-8000-1L* indicates errors were detected but >>> the module responsible for diagnosing those errors was unable to >>> arrive at a diagnosis. This may be caused more commonly by downrev >>> or incompatible revisions of software, or less frequently by bugs in >>> the diagnosis module or subsystem. >>> >>> For SPARC based systems running Solaris 10, the kernel patch 127111 >>> and Solaris FMA patch 119578 are particularly critical and should be >>> examined. Ensure that any dependencies documented in the patch >>> README files are noted and addressed. >>> >>> For x64 based systems running Solaris 10, the kernel patch 127112 >>> and Solaris FMA patches 118344 and 125370 are particularly critical >>> and should be examined. Ensure that any dependencies documented in >>> the patch README files are noted and addressed. >>> >>> Please help check the detail SUNWfmd package revision and make sure if >>> the lasted Soalris FMA patches have been installed. >>> >>> Kind regards, >>> ArrowL >>> >>> dauker....@mic.com.tw wrote: >>> >>> >>>> *Dear Lisa* >>>> >>>> * * >>>> >>>> *The below message show fma check fail in VTS test* >>>> >>>> * * >>>> >>>> * * >>>> >>>> */0215PEV002-1/ >>>> <http://192.168.100.12:8080/nexus/view?domain_name=ALL&batch_name=0215PEV002-1&unit_name=0215PEV002-1>* >>>> >>>> * * >>>> >>>> *Test suit : SAT* >>>> >>>> *Test set: VTS_EXCLUSIVE* >>>> >>>> *Test case: fma_filter_faultx* >>>> >>>> * * >>>> >>>> ############################################################################## >>>> fmcheck version 1.3 (r10262) starting Tue Feb 17 06:40:44 CST 2009 >>>> ------------------------------------------------------------------------------ >>>> fmcheck: current configuration: >>>> fmcheck_code_map = >>>> /home/domain/nexus/units/0215PEV002-1/lib/fmcheck/FMA_NC_CODES >>>> fmcheck_dump_raw = 1 >>>> fmcheck_error_log = >>>> fmcheck_fault_log = >>>> fmcheck_filter_expr = >>>> fmcheck_filter_file = >>>> /home/domain/nexus/units/0215PEV002-1/cfg/vayu/fma_filter_vayu.pl >>>> fmcheck_match_invert = 1 >>>> fmcheck_msg_format = Fault %{suspect.class} on FRU >>>> %{suspect.fru} >>>> fmcheck_output_path = >>>> fmcheck_show_all = 1 >>>> fmcheck_source = faultx >>>> fmcheck_source_component = Ops::FMD::Log::Indirect >>>> fmcheck_use_dict = >>>> fmd_root_path = >>>> fmdump_binary = >>>> fmdump_xml_filename = >>>> no_config = >>>> no_special = >>>> ------------------------------------------------------------------------------ >>>> Found 1 matching faultx event >>>> ------------------------------------------------------------------------------ >>>> Fault Information: >>>> UUID: 36f47185-58a4-435e-faf5-b1f36c848e13 >>>> Time: Tue Feb 17 04:52:03.667716 2009 >>>> Article: http://www.sun.com/msg/SUNOS-8000-1L >>>> Suspects: >>>> [0] Class: defect.sunos.eft.undiagnosable_problem >>>> Certainty: 100% >>>> >>>> Error(s): >>>> [0] Time: Tue Feb 17 04:52:03.649992423 2009 >>>> Class: ereport.cpu.intel.unknown >>>> Detector: hc:///motherboard=0/chip=0/cpu=3 >>>> >>>> ------------------------------------------------------------------------------ >>>> ============================================================================== >>>> Raw FMD Event Data >>>> ============================================================================== >>>> header = (embedded nvlist) >>>> nvlist version: 0 >>>> creator = fmd >>>> hostname = test12-027 >>>> label = faultx >>>> osrel = 5.10 >>>> osver = Generic_137138-08 >>>> plat = i86pc >>>> uuid = 08f78d8d-430b-67e2-c835-9986a6ef3caa >>>> version = 1.2 >>>> (end header) >>>> >>>> event-list = (embedded nvlist) >>>> (start event-list[0]) >>>> nvlist version: 0 >>>> version = 0x0 >>>> class = list.suspect >>>> uuid = 36f47185-58a4-435e-faf5-b1f36c848e13 >>>> diag-time = Tue Feb 17 04:52:03.667716 2009 >>>> __ttl = 0x1 >>>> __tod = 0x4999d1f3 0x2afe2250 >>>> code = SUNOS-8000-1L >>>> de = >>>> fmd://product-id=ARRAY(0x8565d9c),chassis-id=ARRAY(0x8565dc0),server-id=test12-027/mod-name=eft/:mod_version=1.16 >>>> error-list = (array of embedded nvlists) >>>> (start error-list[0]) >>>> nvlist version: 0 >>>> class = ereport.cpu.intel.unknown >>>> euid = c9b9-538ab98-45a6d7f-4ba275e-06b4c2b >>>> ena = 0x17b839277b106001 >>>> error-time = Tue Feb 17 04:52:03.649992423 2009 >>>> IA32_MCG_STATUS = 0x0 >>>> IA32_MCi_ADDR = 0x181d5de40 >>>> IA32_MCi_MISC = 0xc944312000085f47 >>>> IA32_MCi_STATUS = 0xcc0001800001009f >>>> __tod = 0x4999d1f3 0x26be18e7 >>>> __ttl = 0x1 >>>> bank_msr_offset = 0x420 >>>> bank_number = 0x8 >>>> detector = hc:///motherboard=0/chip=0/cpu=3 >>>> error_uncorrected = 0 >>>> error_code = 0x9f >>>> error_enabled = 0 >>>> machine_check_in_progress = 0 >>>> model_specific_error_code = 0x1 >>>> overflow = 1 >>>> processor_context_corrupt = 0 >>>> threshold_based_error_status = No tracking >>>> (end error-list[0]) >>>> >>>> error-list-sz = 0x1 >>>> fault-list = (array of embedded nvlists) >>>> (start fault-list[0]) >>>> nvlist version: 0 >>>> version = 0x0 >>>> class = defect.sunos.eft.undiagnosable_problem >>>> certainty = 100 >>>> reason = ereport zero not found in instance tree >>>> (end fault-list[0]) >>>> >>>> fault-status = 0x1 >>>> fault-list-sz = 0x1 >>>> (end event-list[0]) >>>> >>>> (end event-list) >>>> >>>> ============================================================================== >>>> >>>> SCRIPT=fmcheck EXITSTATUS=31 SYMPTOM=261001 TTF=0 TIME=20090217064044 >>>> EVENT_CODE=SUNOS-8000-1L SUSPECT_ERROR_CLASS=ereport.cpu.intel.unknown >>>> SUSPECT_ERROR_DETECTOR=hc:///motherboard=0/chip=0/cpu=3 >>>> SUSPECT_FAULT_CLASS=defect.sunos.eft.undiagnosable_problem MSG="Fault >>>> defect.sunos.eft.undiagnosable_problem on FRU unknown"** >>>> >>>> >>>> >>>> >>>> >>>> Best Regards >>>> >>>> Dauker, >>>> >>>> //RDD5 Software Design Dept.// >>>> >>>> //Enterprise//// System Bize Uinit // >>>> >>>> //MiTAC International Corp.// >>>> >>>> //TEL: +886 (3) 3289000 Ext 5211// >>>> >>>> //FAX: 886 (3) 3963477// >>>> >>>> >>>> >>>> >>>> >>> -- >>> Regards. >>> >>> Arrow.luo >>> Operation Engineer >>> Asian Operation, WWOPS. >>> Sun Microsystems of California Limited >>> Phone : x58972/+86(0)20 85109972 >>> Mobile: +86-13929171806 >>> Email : arrow....@sun.com >>> >>> >>> ------------------------------------------------------------------------ >>> >>> Sun Microsystems Logo </> >>> >>> >>> >>> >>> Article for Message ID: *SUNOS-8000-1L* >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> *Eft cannot produce diagnosis* >>> >>> *Type* >>> >>> Defect >>> >>> *Severity* >>> >>> Minor >>> >>> *Description* >>> >>> The EFT Diagnosis Engine encountered telemetry for which >>> it is unable to produce a diagnosis. >>> >>> *Automated Response* >>> >>> Error reports from the component will be logged for >>> examination by Sun. >>> >>> *Impact* >>> >>> Automated diagnosis and response for these events will >>> not occur. >>> >>> *Suggested Action for System Administrator* >>> >>> Run pkgchk -n SUNWfmd to ensure that fault management >>> software is installed properly. Contact Sun for support. >>> >>> *Details* >>> >>> The Message ID: *SUNOS-8000-1L* indicates errors were >>> detected but the module responsible for diagnosing those >>> errors was unable to arrive at a diagnosis. This may be >>> caused more commonly by downrev or incompatible >>> revisions of software, or less frequently by bugs in the >>> diagnosis module or subsystem. >>> >>> For SPARC based systems running Solaris 10, the kernel >>> patch 127111 and Solaris FMA patch 119578 are >>> particularly critical and should be examined. Ensure >>> that any dependencies documented in the patch README >>> files are noted and addressed. >>> >>> For x64 based systems running Solaris 10, the kernel >>> patch 127112 and Solaris FMA patches 118344 and 125370 >>> are particularly critical and should be examined. Ensure >>> that any dependencies documented in the patch README >>> files are noted and addressed. >>> >>> If all patches appear to be current and all dependencies >>> are properly addressed, error telemetry will need to be >>> examined by Service personnel to make a further >>> determination of the problem. Please see the information >>> below for certain platform specific patch requirements. >>> >>> For T1000,T2000,Netra T2000,T5120,and T5220 systems, >>> please make sure that the software and firmware patch >>> levels are as follows: >>> >>> 1.) Solaris 5.10: kernel patch 127111-08 or higher >>> >>> 2.) Solaris 5.10: FMA Patch 119578-30 or higher >>> >>> 3.) System firmware: >>> >>> T5120 and T5220 firmware patch 127580-05 or higher >>> >>> T2000 firmware patch 136927-01 or higher >>> >>> Netra T2000 firmware patch 136929-01 or higher >>> >>> T1000 firmware patch 127577-03 or higher >>> >>> *Please Note -* On some system platforms there could be >>> additional information that may disclose a hardware >>> fault, see the following: >>> >>> On Sun SPARC Enterprise Mx000 systems, please check the >>> output of "showstatus" and "fmdump" from the xscfu for >>> possible hardware faults. >>> >>> On Sun Fire 12K, 15K, 20K and E25K systems, please check >>> the output of "showlogs -E -p e list 3" from the main >>> system controller for information about possible >>> hardware faults. >>> >>> On Sun Fire 3800, 48x0, 6800, v1280, E2900, E4900, E6900 >>> and Netra 1280/1290 systems, please check the output of >>> "showlogs -v" from the main system controller for >>> information about possible hardware faults. >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> *Comments on this page?* >>> >>> >>> We welcome your feedback on this page. Please use the >>> form below to >>> comment on the above article. *Bug reports or requests >>> for support >>> should be directed to Sun Support >>> <http://www.sun.com/service/online/>.* >>> >>> Your Name: [optional] >>> Email Address: [optional] >>> This page meets my need: >>> disagree >>> agree >>> >>> >>> Comments: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> For more information, contact Sun Support >>> <http://www.sun.com/service/online/>. >>> >>> >>> ------------------------------------------------------------------------ >>> >>> HOME <http://events.central/> Events Browser UP >>> <http://events.central/event/FMA//defect/sunos/eft/> >>> >>> >>> >>> >>> Event Registry on events.central: >>> */events-gate* >>> >>> >>> defect.sunos.eft.undiagnosable_problem >>> >>> >>> *Stability* >>> >>> Private >>> >>> *Description* >>> >>> EFT unable to produce a diagnosis defect >>> >>> *Sets containing this event* >>> >>> NotArcd <http://events.central/event/../set/FMA/NotArcd> >>> >>> >>> *Event Payload version 0* >>> >>> >>> >>> *Event Payload inherited from defect >>> <http://events/central/event/FMA/defect>*: >>> * Name Type Description* >>> certainty uint8_t Certainty [0..100] for this entry in >>> list >>> class string The event class >>> module fmri The defective software module >>> package fmri The package containing the defective >>> software >>> version uint8_t The major version of this event class >>> >>> *Event Payload inherited from defect.sunos >>> <http://events/central/event/FMA/defect/sunos>*: NONE >>> >>> *Event Payload inherited from defect.sunos.eft >>> <http://events/central/event/FMA/defect/sunos/eft>*: NONE >>> >>> *Event Payload:* >>> * Name Type Description* >>> reason string text description of why eft unable >>> to produce a diagnosis >>> >>> >>> >> >> > > > > ------------------------------------------------------------------------ > > _______________________________________________ > fm-discuss mailing list > fm-discuss@opensolaris.org >
_______________________________________________ fm-discuss mailing list fm-discuss@opensolaris.org