On Wed, Aug 17, 2011 at 8:57 PM, Joe Landman <[email protected]> wrote: > On 08/17/2011 10:43 PM, John Hanks wrote: > As a rule of thumb, you should try to keep the path to swap as simple as > possible. No memory/buffer allocations on the way to a paging event if > you can possibly do this.
I do have a long path there, will try simplifying that and see if it helps. > The lustre client (and most NFS or even network block devices) all do > memory allocation of buffers ... which is anathema to migrating pages > out to disk. You can easily wind up in a "death spiral" race condition > (and it sounds like you are there). You might be able to do something > with iSCSI or SRP (though these also do block allocations and could > trigger death spirals). If you can limit the number of buffers they > allocate, and then force them to allocate the buffers at startup (by > forcing some activity to the block device, and then pin this memory so > that they can't be ejected ...) you might have chance to do it as a > block device. I think SRP can do this, not sure if iSCSI initiators can > pin buffers in ram. > > You might look at the swapz patches (we haven't integrated them into our > kernel yet, but have been looking at it) to compress swap pages and > store them ... in ram. This may not work for you, but it could be an > option. I wasn't aware of swapz, that sounds really interesting. The codes that run the nodes out of memory tend to be sequencing applications, which seem like good candidates for memory compression. > Is there any particular reason you can't use a local drive for this > (such as you don't have local drives, or they aren't big/fast enough)? We're doing this on diskless nodes. I'm not looking to get a huge amount of swap, just enough to provide a place for the root filesystem to page out of the tmpfs so we can squeeze out all the RAM possible for applications. Since I don't expect it to get heavily used, I'm considering running vblade on a server and carving out small aoe LUNs. It seems logical that if a host can boot off of iscsi or aoe, that you could have a swap space there but I've never tried it with either protocol. FWIW, mounting a file on lustre via loopback to provide a local scratch filesystem works really well. jbh _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
