> This looks relevant: > http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html (see > comments for directions to code sample)
Thanks. That's helpful; I've been trying to avoid JNI in the past so wasn't familiar with the API, and the main difficulty was likely to be how to best expose the functionality to Java. Having someone do almost exactly the same thing helps ;) I'm also glad they confirmed the effect in a very similar situation. I'm also leaning towards O_DIRECT as well because: (1) Even if posix_fadvise() is used, on writes you'll need to fsync() before fadvise() anyway in order to allow Linux to evict the pages (a theoretical OS implementation might remember the advise call, but Linux doesn't - at least not up until recently). (2) posix_fadvise() feels more obscure and less portable than O_DIRECT, the latter being well-understood and used by e.g. databases for a long time. (3) O_DIRECT allows more direct control over when I/O happens and to what extent (without playing tricks or making assumptions about e.g. read-ahead) so will probably make it easier to kill both birds with one stone. You indicated you were skeptical about writing an I/O scheduler. While I agree that writing a real I/O scheduler is difficult, I suspect that if we do direct I/O a fairly simple scheme should work well. Being able to tweak a target MB/sec rate, select a chunk size ,and select the time window over which to rate limit, I suspect would go a long way. The situation is a bit special since in this case we are talking about one type of I/O that is run during controlled circumstances (controlled concurrency, we know how much memory we eat in total, etc). I suspect there may be a problem sustaining rates during high read loads though. We'll see. I'll try to make time for trying this out. -- / Peter Schuller