hi everybody
also hopefully Dell tech as this should be directly hardware related I believe.

I manage to segfault omsa(which then crashes the whole system):

[ 1117.103438] dsm_sa_datamgrd[28952]: segfault at 0 ip 00007f2e1ab57b46 sp 00007f2e1197c020 error 4 in libdsm_sm_sasvil.so[7f2e1aae8000+bb000]

Simply by having one H700 in one specific PCI slot in my R815 servers. Server(s) setup in somewhat not-usual, I've stumbled upon this segfault purely by a chance.

I have "embedded" H200 in "integrated storage controller card slot"
I have a Dell Broadcom 4port NIC in "expansion-card slot 2"
I have a H700 in "expansion-card riser 1"
Lastly I have a H800 in "expansion-card slot 5"

H700 was installed for we are going to move hdd array from H200 to H700(but not just yet so we put the H700 card only).

1)Now:
when H700 is in "expansion-card slot 2" everything is working perfectly fine, no segfaults. But "expansion-card slot 1" is pcieX8 which matches H700, and "expansion-card slot 2" is only pcieX4.

2)Now: segfault seems to occur only when I run omxxx storage on that specific H700 controller.

$ omreport storage vdisk vdisk=0 controller=1 - H200, no segfaults
$ omreport storage vdisk vdisk=0 controller=0 - H700, segfaul!

omreport system summary - does not cause it.
After ~20sec after segfault the system suffers from hard/cold power cycle.

So this seems critical. It would be expected of tech-team to investigate it. Separately I'm going to report "tech support request" but felt sharing with other R815's I should do.

b.w.&r.
L


_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

Reply via email to