Hi Lisa and Tarik,
note that Solaris does not add Nehalem specific support until 10u7.
Lynx/Virgo RQT is running 10u6 which is what we will RR with.

mb

> On 02/17/09 11:01, Lisa Curhan wrote:
>   
>> Hi Tarik-
>>
>>  Do you agree with Arrow's assesment that this fault is due to a Solaris
>> FMA code problem rather than a hardware problem?  
>>     
>
> Its hard to say without some more information.
> There could be both a HW issue and SW issue.
>
> First of all we know that we're getting an "ereport.cpu.intel.unknown".
> I'm guessing that probably wouldn't happen unless there was some sort of
> CPU error
> to start with. I'm not sure where that ereport gets generated or why its
> "unknown".
> Gavin Maltby and Adrian Frost are your best bet since they worked on the
> intel
> cpu/mem FMA code.
>
> It looks to me like these ereports should be discarded in the rules.
> prop upset.disc...@chip/cpu (0)->
> ereport.cpu.intel.exter...@chip/cpu,
> ereport.cpu.intel.unkn...@chip/cpu;
>
> However you got an "undiagnosable" defect with a reason of
> "ereport zero not found in instance tree". This is most likely
> due to a problem with:
> - the DE rules
> - the topology
> - the ereport content (detector fmri)
>
> Can we get a `fmdump -eV` and a `fmtopo -pV` output from the system?
> This would help us figure out why the diagnosis failed.
>
>   
>> The e-report points to
>> an intel CPU, but the SunVTS tests running when this fault occurred did
>> not fail.
>>   
>>     
>
> SunVTS will not necessarily fail unless its checking for ereports or faults.
>
> -tarik
>   
>>  If you or one of the other Solaris FMA experts could get back to us
>> today with some advice it would be much appreciated.  This failure mode
>> is occurring both in system test (SAT) and during the RQT testing and
>> Vayu is at a critical point in time where we need to make progress as
>> quickly as possible.  I believe we are using X64 S10u6 in this testing,
>> but I'm not sure which patches we have.
>>
>> Lisa
>> ---------------------------------------------------
>> On 02/17/09 01:45, Arrow.luo wrote:
>>   
>>     
>>> Hi Dauker,
>>>
>>> This failure should be not relative with FMD device, this is the minor 
>>> FMA failure which may cause by incorrect FMA software. Attached files 
>>> are the detail explanation for this failure.
>>>
>>> *Suggested Action for System Administrator*
>>>
>>>     Run pkgchk -n SUNWfmd to ensure that fault management software is
>>>     installed properly. Contact Sun for support. 
>>>
>>> *Details*
>>>
>>>     The Message ID:   *SUNOS-8000-1L* indicates errors were detected but
>>>     the module responsible for diagnosing those errors was unable to
>>>     arrive at a diagnosis. This may be caused more commonly by downrev
>>>     or incompatible revisions of software, or less frequently by bugs in
>>>     the diagnosis module or subsystem.
>>>
>>>     For SPARC based systems running Solaris 10, the kernel patch 127111
>>>     and Solaris FMA patch 119578 are particularly critical and should be
>>>     examined. Ensure that any dependencies documented in the patch
>>>     README files are noted and addressed.
>>>
>>>     For x64 based systems running Solaris 10, the kernel patch 127112
>>>     and Solaris FMA patches 118344 and 125370 are particularly critical
>>>     and should be examined. Ensure that any dependencies documented in
>>>     the patch README files are noted and addressed.
>>>
>>> Please help check the detail SUNWfmd package revision and make sure if 
>>> the lasted Soalris FMA patches have been installed.
>>>
>>> Kind regards,
>>> ArrowL
>>>
>>> dauker....@mic.com.tw wrote:
>>>     
>>>       
>>>> *Dear Lisa*
>>>>
>>>> * *
>>>>
>>>> *The below message show fma check fail in VTS test*
>>>>
>>>> * *
>>>>
>>>> * *
>>>>
>>>> */0215PEV002-1/ 
>>>> <http://192.168.100.12:8080/nexus/view?domain_name=ALL&batch_name=0215PEV002-1&unit_name=0215PEV002-1>*
>>>>
>>>> * *
>>>>
>>>> *Test suit : SAT*
>>>>
>>>> *Test set: VTS_EXCLUSIVE*
>>>>
>>>> *Test case: fma_filter_faultx*
>>>>
>>>> * *
>>>>
>>>> ##############################################################################
>>>> fmcheck version 1.3 (r10262) starting Tue Feb 17 06:40:44 CST 2009
>>>> ------------------------------------------------------------------------------
>>>> fmcheck: current configuration:
>>>>         fmcheck_code_map         = 
>>>> /home/domain/nexus/units/0215PEV002-1/lib/fmcheck/FMA_NC_CODES
>>>>         fmcheck_dump_raw         = 1
>>>>         fmcheck_error_log        = 
>>>>         fmcheck_fault_log        = 
>>>>         fmcheck_filter_expr      = 
>>>>         fmcheck_filter_file      = 
>>>> /home/domain/nexus/units/0215PEV002-1/cfg/vayu/fma_filter_vayu.pl
>>>>         fmcheck_match_invert     = 1
>>>>         fmcheck_msg_format       = Fault %{suspect.class} on FRU 
>>>> %{suspect.fru}
>>>>         fmcheck_output_path      = 
>>>>         fmcheck_show_all         = 1
>>>>         fmcheck_source           = faultx
>>>>         fmcheck_source_component = Ops::FMD::Log::Indirect
>>>>         fmcheck_use_dict         = 
>>>>         fmd_root_path            = 
>>>>         fmdump_binary            = 
>>>>         fmdump_xml_filename      = 
>>>>         no_config                = 
>>>>         no_special               = 
>>>> ------------------------------------------------------------------------------
>>>> Found 1 matching faultx event
>>>> ------------------------------------------------------------------------------
>>>> Fault Information: 
>>>>        UUID: 36f47185-58a4-435e-faf5-b1f36c848e13
>>>>        Time: Tue Feb 17 04:52:03.667716 2009
>>>>     Article: http://www.sun.com/msg/SUNOS-8000-1L
>>>>    Suspects: 
>>>>              [0]    Class: defect.sunos.eft.undiagnosable_problem
>>>>                 Certainty: 100%
>>>>  
>>>>    Error(s): 
>>>>              [0]     Time: Tue Feb 17 04:52:03.649992423 2009
>>>>                     Class: ereport.cpu.intel.unknown
>>>>                  Detector: hc:///motherboard=0/chip=0/cpu=3
>>>>  
>>>> ------------------------------------------------------------------------------
>>>> ==============================================================================
>>>> Raw FMD Event Data
>>>> ==============================================================================
>>>> header = (embedded nvlist)
>>>> nvlist version: 0
>>>>         creator = fmd
>>>>         hostname = test12-027
>>>>         label = faultx
>>>>         osrel = 5.10
>>>>         osver = Generic_137138-08
>>>>         plat = i86pc
>>>>         uuid = 08f78d8d-430b-67e2-c835-9986a6ef3caa
>>>>         version = 1.2
>>>> (end header)
>>>>  
>>>> event-list = (embedded nvlist)
>>>> (start event-list[0])
>>>> nvlist version: 0
>>>>         version = 0x0
>>>>         class = list.suspect
>>>>         uuid = 36f47185-58a4-435e-faf5-b1f36c848e13
>>>>         diag-time = Tue Feb 17 04:52:03.667716 2009
>>>>         __ttl = 0x1
>>>>         __tod = 0x4999d1f3 0x2afe2250
>>>>         code = SUNOS-8000-1L
>>>>         de = 
>>>> fmd://product-id=ARRAY(0x8565d9c),chassis-id=ARRAY(0x8565dc0),server-id=test12-027/mod-name=eft/:mod_version=1.16
>>>>         error-list = (array of embedded nvlists)
>>>>         (start error-list[0])
>>>>         nvlist version: 0
>>>>                 class = ereport.cpu.intel.unknown
>>>>                 euid = c9b9-538ab98-45a6d7f-4ba275e-06b4c2b
>>>>                 ena = 0x17b839277b106001
>>>>                 error-time = Tue Feb 17 04:52:03.649992423 2009
>>>>                 IA32_MCG_STATUS = 0x0
>>>>                 IA32_MCi_ADDR = 0x181d5de40
>>>>                 IA32_MCi_MISC = 0xc944312000085f47
>>>>                 IA32_MCi_STATUS = 0xcc0001800001009f
>>>>                 __tod = 0x4999d1f3 0x26be18e7
>>>>                 __ttl = 0x1
>>>>                 bank_msr_offset = 0x420
>>>>                 bank_number = 0x8
>>>>                 detector = hc:///motherboard=0/chip=0/cpu=3
>>>>                 error_uncorrected = 0
>>>>                 error_code = 0x9f
>>>>                 error_enabled = 0
>>>>                 machine_check_in_progress = 0
>>>>                 model_specific_error_code = 0x1
>>>>                 overflow = 1
>>>>                 processor_context_corrupt = 0
>>>>                 threshold_based_error_status = No tracking
>>>>         (end error-list[0])
>>>>  
>>>>         error-list-sz = 0x1
>>>>         fault-list = (array of embedded nvlists)
>>>>         (start fault-list[0])
>>>>         nvlist version: 0
>>>>                 version = 0x0
>>>>                 class = defect.sunos.eft.undiagnosable_problem
>>>>                 certainty = 100
>>>>                 reason = ereport zero not found in instance tree
>>>>         (end fault-list[0])
>>>>  
>>>>         fault-status = 0x1
>>>>         fault-list-sz = 0x1
>>>> (end event-list[0])
>>>>  
>>>> (end event-list)
>>>>  
>>>> ==============================================================================
>>>>
>>>> SCRIPT=fmcheck EXITSTATUS=31 SYMPTOM=261001 TTF=0 TIME=20090217064044 
>>>> EVENT_CODE=SUNOS-8000-1L SUSPECT_ERROR_CLASS=ereport.cpu.intel.unknown 
>>>> SUSPECT_ERROR_DETECTOR=hc:///motherboard=0/chip=0/cpu=3 
>>>> SUSPECT_FAULT_CLASS=defect.sunos.eft.undiagnosable_problem MSG="Fault 
>>>> defect.sunos.eft.undiagnosable_problem on FRU unknown"**
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>> Best Regards
>>>>
>>>> Dauker,
>>>>
>>>> //RDD5 Software Design Dept.//
>>>>
>>>> //Enterprise//// System Bize Uinit //
>>>>
>>>> //MiTAC International Corp.//
>>>>
>>>> //TEL: +886 (3) 3289000 Ext 5211//
>>>>
>>>> //FAX: 886 (3) 3963477//
>>>>
>>>>  
>>>>
>>>>       
>>>>         
>>> -- 
>>> Regards.
>>>
>>> Arrow.luo
>>> Operation Engineer
>>> Asian Operation, WWOPS.
>>> Sun Microsystems of California Limited
>>> Phone : x58972/+86(0)20 85109972
>>> Mobile: +86-13929171806
>>> Email : arrow....@sun.com
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>     Sun Microsystems Logo </>       
>>>
>>>  
>>>
>>>
>>>             Article for Message ID:   *SUNOS-8000-1L*
>>>
>>>             
>>> ------------------------------------------------------------------------
>>>
>>>
>>>             *Eft cannot produce diagnosis*
>>>
>>>             *Type*
>>>
>>>                 Defect 
>>>
>>>             *Severity*
>>>
>>>                 Minor 
>>>
>>>             *Description*
>>>
>>>                 The EFT Diagnosis Engine encountered telemetry for which
>>>                 it is unable to produce a diagnosis. 
>>>
>>>             *Automated Response*
>>>
>>>                 Error reports from the component will be logged for
>>>                 examination by Sun. 
>>>
>>>             *Impact*
>>>
>>>                 Automated diagnosis and response for these events will
>>>                 not occur. 
>>>
>>>             *Suggested Action for System Administrator*
>>>
>>>                 Run pkgchk -n SUNWfmd to ensure that fault management
>>>                 software is installed properly. Contact Sun for support. 
>>>
>>>             *Details*
>>>
>>>                 The Message ID:   *SUNOS-8000-1L* indicates errors were
>>>                 detected but the module responsible for diagnosing those
>>>                 errors was unable to arrive at a diagnosis. This may be
>>>                 caused more commonly by downrev or incompatible
>>>                 revisions of software, or less frequently by bugs in the
>>>                 diagnosis module or subsystem.
>>>
>>>                 For SPARC based systems running Solaris 10, the kernel
>>>                 patch 127111 and Solaris FMA patch 119578 are
>>>                 particularly critical and should be examined. Ensure
>>>                 that any dependencies documented in the patch README
>>>                 files are noted and addressed.
>>>
>>>                 For x64 based systems running Solaris 10, the kernel
>>>                 patch 127112 and Solaris FMA patches 118344 and 125370
>>>                 are particularly critical and should be examined. Ensure
>>>                 that any dependencies documented in the patch README
>>>                 files are noted and addressed.
>>>
>>>                 If all patches appear to be current and all dependencies
>>>                 are properly addressed, error telemetry will need to be
>>>                 examined by Service personnel to make a further
>>>                 determination of the problem. Please see the information
>>>                 below for certain platform specific patch requirements.
>>>
>>>                 For T1000,T2000,Netra T2000,T5120,and T5220 systems,
>>>                 please make sure that the software and firmware patch
>>>                 levels are as follows:
>>>
>>>                 1.) Solaris 5.10: kernel patch 127111-08 or higher
>>>
>>>                 2.) Solaris 5.10: FMA Patch 119578-30 or higher
>>>
>>>                 3.) System firmware:
>>>
>>>                 T5120 and T5220 firmware patch 127580-05 or higher
>>>
>>>                 T2000 firmware patch 136927-01 or higher
>>>
>>>                 Netra T2000 firmware patch 136929-01 or higher
>>>
>>>                 T1000 firmware patch 127577-03 or higher
>>>
>>>                 *Please Note -* On some system platforms there could be
>>>                 additional information that may disclose a hardware
>>>                 fault, see the following:
>>>
>>>                 On Sun SPARC Enterprise Mx000 systems, please check the
>>>                 output of "showstatus" and "fmdump" from the xscfu for
>>>                 possible hardware faults.
>>>
>>>                 On Sun Fire 12K, 15K, 20K and E25K systems, please check
>>>                 the output of "showlogs -E -p e list 3" from the main
>>>                 system controller for information about possible
>>>                 hardware faults.
>>>
>>>                 On Sun Fire 3800, 48x0, 6800, v1280, E2900, E4900, E6900
>>>                 and Netra 1280/1290 systems, please check the output of
>>>                 "showlogs -v" from the main system controller for
>>>                 information about possible hardware faults.
>>>
>>>
>>>
>>>
>>>
>>>             
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>                   *Comments on this page?*
>>>
>>>
>>>                   We welcome your feedback on this page. Please use the
>>>                   form below to
>>>                   comment on the above article. *Bug reports or requests
>>>                   for support
>>>                   should be directed to Sun Support
>>>                   <http://www.sun.com/service/online/>.*
>>>
>>>                   Your Name:                 [optional]  
>>>                   Email Address:             [optional]  
>>>                     This page meets my need:        
>>>                     disagree                                                
>>>          agree  
>>>
>>>                      
>>>                   Comments:                  
>>>                                      
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>             For more information, contact Sun Support
>>>             <http://www.sun.com/service/online/>.
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>  HOME <http://events.central/>      Events Browser  UP 
>>> <http://events.central/event/FMA//defect/sunos/eft/> 
>>>
>>>
>>>
>>>
>>> Event Registry on events.central:
>>> */events-gate*
>>>
>>>
>>>     defect.sunos.eft.undiagnosable_problem
>>>
>>>
>>> *Stability*
>>>
>>> Private
>>>
>>> *Description*
>>>
>>> EFT unable to produce a diagnosis defect
>>>
>>> *Sets containing this event*
>>>
>>> NotArcd <http://events.central/event/../set/FMA/NotArcd>
>>>
>>>
>>> *Event Payload version 0*
>>>
>>>
>>>      
>>> *Event Payload inherited from defect 
>>> <http://events/central/event/FMA/defect>*: 
>>>        *      Name            Type      Description*
>>>            certainty         uint8_t   Certainty [0..100] for this entry in 
>>> list
>>>                class          string   The event class
>>>               module            fmri   The defective software module
>>>              package            fmri   The package containing the defective 
>>> software
>>>              version         uint8_t   The major version of this event class
>>>      
>>> *Event Payload inherited from defect.sunos 
>>> <http://events/central/event/FMA/defect/sunos>*: NONE
>>>      
>>> *Event Payload inherited from defect.sunos.eft 
>>> <http://events/central/event/FMA/defect/sunos/eft>*: NONE
>>>
>>> *Event Payload:*
>>>        *      Name            Type      Description*
>>>               reason          string   text description of why eft unable 
>>> to produce a diagnosis
>>>
>>>     
>>>       
>>   
>>     
>
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> fm-discuss mailing list
> fm-discuss@opensolaris.org
>   

_______________________________________________
fm-discuss mailing list
fm-discuss@opensolaris.org

Reply via email to