hi everybody
also hopefully Dell tech as this should be directly hardware
related I believe.
I manage to segfault omsa(which then crashes the whole system):
[ 1117.103438] dsm_sa_datamgrd[28952]: segfault at 0 ip
00007f2e1ab57b46 sp 00007f2e1197c020 error 4 in
libdsm_sm_sasvil.so[7f2e1aae8000+bb000]
Simply by having one H700 in one specific PCI slot in my
R815 servers.
Server(s) setup in somewhat not-usual, I've stumbled upon
this segfault purely by a chance.
I have "embedded" H200 in "integrated storage controller
card slot"
I have a Dell Broadcom 4port NIC in "expansion-card slot 2"
I have a H700 in "expansion-card riser 1"
Lastly I have a H800 in "expansion-card slot 5"
H700 was installed for we are going to move hdd array from
H200 to H700(but not just yet so we put the H700 card only).
1)Now:
when H700 is in "expansion-card slot 2" everything is
working perfectly fine, no segfaults.
But "expansion-card slot 1" is pcieX8 which matches H700,
and "expansion-card slot 2" is only pcieX4.
2)Now: segfault seems to occur only when I run omxxx storage
on that specific H700 controller.
$ omreport storage vdisk vdisk=0 controller=1 - H200, no
segfaults
$ omreport storage vdisk vdisk=0 controller=0 - H700, segfaul!
omreport system summary - does not cause it.
After ~20sec after segfault the system suffers from
hard/cold power cycle.
So this seems critical. It would be expected of tech-team to
investigate it.
Separately I'm going to report "tech support request" but
felt sharing with other R815's I should do.
b.w.&r.
L
_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge