> So we can map the device memory with WB or WT semantics, and movnt will > enable > WC. And the nice thing about this trick, is that both WB and WT *are already > programmed into PAT after reset*, which means that we can use them for pages > we > map for userspace, without stepping on anyone's toes or waiting for > the generic in-kernel support for WC to materialize.
I'm not sure whether this is much of an advantage. There's no generic way to map memory with WB that I know of. I don't think that setting a PAT entry for WC is the hold-up -- the problem is more in the right infrastructure for pgprot_xxx(). I don't think it's very nice to have #ifdef __x86_64__ in a driver. > I attach a header file that implements WC memcpy with these > instructions for lengths from 16 to 128 bytes (and one can, > naturally, just call xmm_copy64 in a loop), that I wrote for fun > at some point. Feel free to read/flame/reuse in any way you like. Using movntdq means we have to save off xmm's, and it's a hassle to get a properly aligned block to be able to use movdqa to save them (you can't rely on the stack being 16-byte aligned). I'd be curious to see whether it's even worth it for a 64-byte copy (which is probably the most common case for BF), since you need 8 extra movdqa to save/restore the xmms on top of 4 movdqa to load the WQE and 4 movntdq to write it. Just plain movnti might be the simplest thing to do, since 16 movnti is all you would need, and I think that comes out to be smaller code than 12 movdqa + 4 movntdq. (Optimizing the WQE copy in assembly might be worth it independent of how we map the BF page for WC, since obviously posting BF sends is a super-hot path. And it's fun to write SSE code anyway) - R. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
