On Thu, May 17, 2007 at 02:57:22PM -0400, Patrick Geoffray wrote: > gshipman wrote: > >> The fork() problem is due to memory registration aggravated by > >> registration cache. Memory registration in itself is a hack from > >> the OS > >> point of view, and you already know a lot about the various problems > >> related to registration cache. > >> > > So Gleb is indicating that this is a problem in the pipeline protocol > > which does not use a registration cache. I think the registration > > cache, while increasing the probability of badness after fork, is not > > the culprit. > > Indeed, it makes things worse by extending the vulnerability outside the > time frame of an asynchronous communication. Without the registration > cache, the bad case is limited to a process that forks while a com is > pending and touches the same pages before they are read/written by the > hardware. This is not very likely because the window of time is very > small, but still possible. However, it is not limited to the last > partial page of the buffer, it can happen for any pinned page. > Now I see that you don't fully understand all of the IB ugliness. Here I explain it to you. In IB QP and CQ also use registered memory that is directly written/read by a hardware (to signal a completion or to get next work request). After fork() parent continues to use IB of cause and most definitely touches QP/CQ memory and at this very moment everything breaks. So to overcome this problem (and to allow IB program fork() at all) new madvice flag was implemented that allows userspace to mark certain memory to not be copied to a child process. This memory is not mapped in a child at all, no even VMA created for it. In the parent this memory is not marked COW. All memory that is registered by IB is marked in this way. So the problem is that if non aligned buffer is committed to MPI it may share a page with some data that child may want to use, but this data will not be present in a child.
-- Gleb.