Looking at iprop code I've noticed that it does more msync()s than it probably has to. I suspect that this is a result of someone thinking it'd be better to msync() just the bytes written to, but it's not: each msync() acts as a write() + fsync(), and it's the fsync() that kills performance, not the write(). OTOH, msync() could actually act as a write() of all that memory (as opposed to a bunch of pwrite()s of just the dirty pages), in which case an msync() of the whole ulog could be very painful. It'd be useful to know what the behavior of msync() is on the various OSes.
I'm starting to think that we should use mmap() only for reading the log, but for writing we should pwrite() and then fsync() when we're done. This can probably be done with relatively little violence to the code. Or am I missing something? All writes to the ulog are done under exclusive lock, so msync()ing parts at a time is not done for synchronization reasons. I think what we want is something like a guarantee that several msync()s with MS_ASYNC followed by an fsync() will cause the msync()ed regions to be written to disk before the fsync() returns. But that's not what the semantics of msync() are. Nico -- ________________________________________________ Kerberos mailing list [email protected] https://mailman.mit.edu/mailman/listinfo/kerberos
