> 000: 17 00 00 00 17 00 00 00 18 00 00 00 18 00 00 00 > 010: 19 00 00 00 19 00 00 00 1a 00 00 00 1a 00 00 00 > 020: 1b 00 00 00 1b 00 00 00 1c 00 00 00 1c 00 00 00 > 030: 1d 00 00 00 1d 00 00 00 1e 00 00 00 1e 00 00 00 > 040: 1f 00 00 00 1f 00 00 00 00 00 00 00 00 00 00 00 > 050: 01 00 00 00 01 00 00 00 02 00 00 00 02 00 00 00
OK, my guess right now would be that when the driver is trying to give memory to the HCA to use for its internal hardware data structures, the bus addresses given to the HCA end up being wrong for some reason. There could be a bug in mthca, but since this code is working fine on lots of non-Xen systems (and not just i386/x86-64 but also ppc and ia64 at least) right now I would be more suspicious of a bug in the Xen domU's pci_map_sg() or something like that. You can look in mthca_memfree.c, specifically mthca_alloc_icm() to see how the memory to give to the HCA is allocated and mapped. I gave it a quick look over and the way the DMA mapping API is used looks OK to me, but perhaps there is a subtle problem that is exposed by Xen. Although as I said before, right now I think it's more likely that we are hitting a bug in the Xen domU implementation of DMA mapping. Michael, does my guess about the source of corruption make sense? Is that pattern of every fourth byte counting up 00 ... 1f something the the HCA would write during initialization of ICM? - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
