On 10/14/13 11:17 PM, Matthew Ahrens wrote:
> That message is about failing due to running out of memory, which your
> changes don't address.  Your changes address running out of virtual
> address space.

Actually no, in that case the machine was running "low on memory" in the
sense of there being very few large unallocated chunks. They had 128GB
of physical, of which 90GB was used by ARC - not exactly what you'd call
a traditional low memory situation (the ARC being considered
expendable). What happened was that the kernel addresses space got
heavily fragmented and the ARC wasn't feeling much pressure to contract,
so larger allocations started failing. What compounded the situation was
that the allocation originated in the ARC itself and was KM_SLEEP, which
meant that after a journey through the VM subsystem, the ARC *did*
attempt to contract accommodate the new allocation, but it deadlocked
(due to having reentered itself again).

In the hash table allocation case this deadlock situation fortunately
can't happen (KM_NOSLEEP), but what will happen is that the buf_init
routine will retry lowering after its request size, potentially
resulting in a much smaller hash table with far worse performance. The
impact this can have on performance can be very hard to diagnose and
since the problem is transient (after a reboot it's most likely gone),
very hard to pin down.

Cheers,
-- 
Saso
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to