> I think Patrick's point is that it's not too much more expensive to do the > syscall on Linux vs just doing the cache lookup, particularly in the > context of a long message. And it means that upper layer protocols like > MPI don't have to deal with caches (and since MPI implementors hate > registration caches only slightly less than we hate MPI_CANCEL, that will > make us happy).
Stick in a separate library then? I don't think we want the complexity in the kernel -- I personally would argue against merging it upstream; and given that the userspace solution is actually faster, it becomes pretty hard to justify.