Hi All,
I am facing issue on Marvell's ARM64 Thunder X2 with kdump kernel.
Here network card is continuously giving following AER error
[ 100.839168] igb 0000:09:00.1: AER: aer_status: 0x00004000,
aer_mask: 0x00000000
[ 100.846463] igb 0000:09:00.1: AER: [14] CmpltTO (First)
[ 100.861491] igb 0000:09:00.1: AER: aer_layer=Transaction Layer,
aer_agent=Requester ID
[ 100.869400] igb 0000:09:00.1: AER: aer_uncor_severity: 0x00062011
This error is not 100% reproducible. It happens 1 out of 4 try.
This error goes away in following two scenarios
A) Set iommu in bypass mode via bootargs iommu.passthrough=1
B) Wait for ~100ms in arm_smmu_device_reset of drivers/iommu/arm-smmu-v3.c
if (reg & CR0_SMMUEN) {
dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
WARN_ON(is_kdump_kernel() && !disable_bypass);
mdelay(100); <-- Added delay
arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
}
>From A), it is clear that it is related to IOMMU
>From B), looks like during boot of kdump kernel, network card is still
active and it has sent some request over PCIe.
as GPBA_ABORT bit is set, no response/completion coming to PCIe
controller hence "CmpltTO" error.
Ideally before setting GPBA_ABORT bit, there should be some check for
active transaction. if it is not possible, a wait should be done to
assure that no more pending transaction left.
why any such delay has not been considered?
--pk
_______________________________________________
kexec mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/kexec