At 06:04 PM 10/3/2007, Roland Dreier wrote: > > I would look into the dma_sync behavior on the receiver. Especially > > on an Opteron, it's critical to synchronize the iommu and cachelines > > to the right memory locations. Since the FMR code hides some of > > this, it may be a challenge to trace. Can you try another memory > > registration strategy? NFS/RDMA can do that, for example. > >I think this is a red herring. Every IB HCA does 64-bit DMA, which >means it bypasses all the Opteron iommu/swiotlb stuff. > >Also FMR doesn't hide any DMA mapping stuff; it is completely up to >the consumer to handle all the DMA mapping, because FMRs operate >completely at the level of bus (HCA DMA) addresses.
Fair enough, but the FMR *pools* still worry me, because they manage internal registrations and defer their manipulation. Depending on lots of things beyond the consumer's control, they sometimes don't even close the handles advertised to the RDMA peer. Bypassing the pools and going directly to the FMRs themselves avoids this (which is what NFS/RDMA does), but iSER and SRP both use the pool API, don't they? So, what else sends an RDMA write into the weeds? Short of writing to the wrong address, it sure sounds like a dma consistency thing to me. The connection wasn't lost, so it's not an error. Tom. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
