Hello, I've been getting these types of error messages on my OSS nodes, and I'm wondering if I have a client sending bad data through RDMA. My googling has been fruitless to discover the meaning of "fault reason 34" but the PCI addresses are my 10Gb and 1Gb NICs.
I'm not really sure where to begin diagnosing this error, so I'm hoping one of you have seen this before. One thing to note is that no clients should be using the 1Gb NIC to mount the file system; it's just for management so I don't know why I'd see a DMA error on PCI 04:00.1. dmar: INTR-REMAP: Request device [[81:00.1] fault index 8c INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear dmar: DRHD: handling fault status reg 102 dmar: INTR-REMAP: Request device [[04:00.1] fault index 75 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear dmar: DRHD: handling fault status reg 202 dmar: INTR-REMAP: Request device [[04:00.1] fault index 74 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear dmar: DRHD: handling fault status reg 302 dmar: INTR-REMAP: Request device [[04:00.1] fault index 73 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear kernel: 2.6.32-504.23.4.el6.x86_64 lustre: lustre-2.7.58-2.6.32_504.23.4.el6.x86_64_g051c25b.x86_64 zfs: zfs-0.6.4-76_g87abfcb.el6.x86_64
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
