On Mon, Oct 14, 2013 at 3:32 PM, Saso Kiselkov <[email protected]>wrote:
> On 10/14/13 11:17 PM, Matthew Ahrens wrote: > > That message is about failing due to running out of memory, which your > > changes don't address. Your changes address running out of virtual > > address space. > > Actually no, in that case the machine was running "low on memory" in the > sense of there being very few large unallocated chunks. They had 128GB > of physical, of which 90GB was used by ARC - not exactly what you'd call > a traditional low memory situation (the ARC being considered > expendable). > What happened was that the kernel addresses space got > heavily fragmented and the ARC wasn't feeling much pressure to contract, > so larger allocations started failing. How did you come to that conclusion? E.g. how did you measure kernel address space fragmentation? I wonder what exactly the OP meant by "VM slab allocator is quite fragmented". It may mean that there are many buffers not in use (e.g. as measured by ::kmastat's "buf total" - "buf in use"). This does not indicate virtual address space fragmentation. I see your fundamental premise as: In some situations, a single large memory allocation may fail, but several smaller allocations (totaling the same amount of memory) would succeed. This situation may occur when the zfs kernel module is loaded. Would you agree? To me, this premise is plausible in a virtual address-constrained system (e.g. 32-bit), if the kernel module is loaded some time after booting (e.g. on Linux). Are you addressing a 32-bit only problem, or do you contend that the problem also exists on 64-bit systems? What compounded the situation was > that the allocation originated in the ARC itself and was KM_SLEEP, which > meant that after a journey through the VM subsystem, the ARC *did* > attempt to contract accommodate the new allocation, but it deadlocked > (due to having reentered itself again). > > In the hash table allocation case this deadlock situation fortunately > can't happen (KM_NOSLEEP), but what will happen is that the buf_init > routine will retry lowering after its request size, potentially > resulting in a much smaller hash table with far worse performance. The > impact this can have on performance can be very hard to diagnose and > since the problem is transient (after a reboot it's most likely gone), > very hard to pin down. > I agree that silently reducing the hash table size is probably a bad idea. --matt
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
