> Fair enough, but the FMR *pools* still worry me, because they manage > internal registrations and defer their manipulation. Depending on lots > of things beyond the consumer's control, they sometimes don't even > close the handles advertised to the RDMA peer.
The FMR pool stuff (especially with caching turned off, as the iSER initiator uses the API) isn't really doing anything particularly fancy. It just keeps a list of FMRs that are available to remap, and batches up the unregistration. It is true that an R_Key may remain valid after an FMR is unmapped, but that's the whole point of FMRs: if you don't batch up the real flushing to amortize the cost, they're no better than regular MRs really. > So, what else sends an RDMA write into the weeds? Short of writing > to the wrong address, it sure sounds like a dma consistency thing to > me. The connection wasn't lost, so it's not an error. I don't have that feeling. x86 systems are really pretty strongly consistent with respect to DMA when you're not using any of the GART/IOMMU stuff, so I think it's more likely that either the wrong address is being given to the HCA somehow, or the mthca FMR implementation is making the HCA write to the wrong address. Especially since the correct data never shows up even after a long time, it seems that the data must just be going to the wrong place. Given that there was an FMR bug with 1-port Mellanox HCAs that caused iSER corruption, I would like to make sure that the same thing isn't hitting here as well. Reproducing on 2.6.22 or 2.6.23-rcX (which have the bug fixed) would rule that out, as would seeing the bug on anything but a 1-port Mellanox HCA. - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
