[email protected] writes:
> Hello,
>
> Thank you for using Dell EMC OMSA.
>
> Kindly note that, OMSA is not officially certified and supported on CentOS.
> However, we have tried to reproduce on R730 server and CentOS 6.8 64-bit. We
> are unable to reproduce the issue.
>
> Can you please help us providing more details about the issue?
>
> Whether you have installed directly OMSA8.4 version or it has been upgraded
> from different version?
In brief: it seems to fail on machines not running the latest CentOS 6.8,
ie. running 6.4/6.6.
All machines were upgraded to 8.4 last October and were working fine since.
When I noticed what is happening, I removed 8.4 completely and reinstalled
from scratch, no change. By completely, I mean yum remove and then rm -rf
/opt/dell.
> Can you share us the coredump or stack trace of the crash?
I don't get much for either dsm_sa_eventmgrd or dsm_sa_snmpd.
[root@host ]# /opt/dell/srvadmin/sbin/srvadmin-services.sh start ; strace -p
$(ps -ef |awk '/\/opt\/dell\/srvadmin\/sbin\/dsm_sa_snmpd/{print $2}')
Starting Systems Management Device Drivers:
Starting dell_rbu: [ OK ]
Starting ipmi driver:
Already started [ OK ]
Starting Systems Management Data Engine:
Starting dsm_sa_datamgrd: [ OK ]
Starting dsm_sa_eventmgrd: [ OK ]
Starting dsm_sa_snmpd: [ OK ]
Starting DSM SA Shared Services: [ OK ]
Starting DSM SA Connection Service: [ OK ]
Process 12017 attached - interrupt to quit
rt_sigtimedwait([INT TERM], NULL, NULL, 8 <unfinished ...>
+++ killed by SIGSEGV +++
[root@host ~]#
> Have you upgraded any OS pkg's?
Since last July, no packages have been updated or installed other than
dell-system-update, dsucatalog, invcol. That's for the 6.4/6.6 systems.
> Any other useful data for reproducing the issue?
It's a bizarre problem. Machines that have seen no other updates in the
past year than OMSA/dsu etc. This is a machine where everything was working
fine up until a few minutes ago. It runs CentOS 6.6 and had OMSA 8.4 installed
last October.
[root@host log]# cat yum.log
Jan 24 11:25:35 Updated: dsucatalog-17.01.00-TDDR9.noarch
Jan 24 11:25:36 Updated: dell-system-update-1.3.1-17.01.00.x86_64
Jan 24 13:05:16 Erased: dsucatalog
Jan 24 13:17:57 Installed: dsucatalog-17.01.00-TDDR9.noarch
Jan 24 13:29:58 Installed:
invcol_WF06C_LN64_16.12.200.896_A00-16.12.200.896-WF06C.x86_64
Feb 28 14:00:50 Updated: dsucatalog-17.02.00-WF25X.noarch
Feb 28 14:00:50 Updated: dell-system-update-1.4.0-17.02.00.x86_64
Mar 01 11:49:16 Erased: dsucatalog
Mar 01 12:01:41 Installed: dsucatalog-17.02.00-WF25X.noarch
[root@host log]#
[root@host log]# cd
[root@host ~]# /opt/dell/srvadmin/sbin/srvadmin-services.sh status
dell_rbu (module) is running
ipmi driver is running
dsm_sa_datamgrd (pid 31664 31382) is running
dsm_sa_eventmgrd (pid 31617) is running
dsm_sa_snmpd (pid 31645) is running
dsm_om_shrsvcd (pid 31898) is running
dsm_om_connsvcd (pid 31964 31963) is running
[root@host log]# /opt/dell/srvadmin/sbin/srvadmin-services.sh stop
Shutting down DSM SA Shared Services: [ OK ]
Shutting down DSM SA Connection Service: [ OK ]
Stopping Systems Management Data Engine:
Stopping dsm_sa_snmpd: [ OK ]
Stopping dsm_sa_eventmgrd: [ OK ]
Stopping dsm_sa_datamgrd: [ OK ]
Stopping Systems Management Device Drivers:
Stopping dell_rbu: [ OK ]
[root@host log]# ps -ef |grep dsm
root 27396 18309 0 17:50 pts/1 00:00:00 grep dsm
[root@host log]# /opt/dell/srvadmin/sbin/srvadmin-services.sh start
Starting Systems Management Device Drivers:
Starting dell_rbu: [ OK ]
Starting ipmi driver:
Already started [ OK ]
Starting Systems Management Data Engine:
Starting dsm_sa_datamgrd: [ OK ]
Starting dsm_sa_eventmgrd: [ OK ]
Starting dsm_sa_snmpd: [ OK ]
Starting DSM SA Shared Services: [ OK ]
Starting DSM SA Connection Service: [ OK ]
[root@host log]# /opt/dell/srvadmin/sbin/srvadmin-services.sh status
dell_rbu (module) is running
ipmi driver is running
dsm_sa_datamgrd (pid 27888 27616) is running
dsm_sa_eventmgrd is stopped
dsm_sa_snmpd is stopped
dsm_om_shrsvcd (pid 27937) is running
dsm_om_connsvcd (pid 28005 28004) is running
Mar 10 17:51:11 host kernel: dsm_sa_eventmgr[27854]: segfault at 0 ip
00007f24a56e6220 sp 00007f24a640e0f8 error 4 in
libc-2.12.so[7f24a55bd000+18a000]
Mar 10 17:51:11 host snmpd[2130]: [smux_process] peek failed: Success
Mar 10 17:51:11 host kernel: dsm_sa_snmpd[27882]: segfault at 7fc800000000 ip
00007fc8139d2220 sp 00007fc8146d9828 error 4 in
libc-2.12.so[7fc8138a9000+18a000]
Mar 10 17:52:47 host kernel: MaserIE[28384]: segfault at 7f8b00000000 ip
00007f8bda6f0220 sp 00007ffc0e967098 error 4 in
libc-2.12.so[7f8bda5c7000+18a000]
_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge