On 02/08/2018 06:36 PM, Andrew Morton wrote:
On Wed, 31 Jan 2018 18:04:00 -0500 daniel.m.jor...@oracle.com wrote:

lru_lock, a per-node* spinlock that protects an LRU list, is one of the
hottest locks in the kernel.  On some workloads on large machines, it
shows up at the top of lock_stat.

Do you have details on which callsites are causing the problem?  That
would permit us to consider other approaches, perhaps.

Sure, there are two paths where we're seeing contention.

In the first one, a pagevec's worth of anonymous pages are added to various LRUs when the per-cpu pagevec fills up:

  /* take an anonymous page fault, eventually end up at... */
                /* contend on lru_lock */

In the second, one or more pages are removed from an LRU under one hold of lru_lock:

  // userland calls munmap or exit, eventually end up at...
    __tlb_remove_page // returns true because we eventually hit
                      // MAX_GATHER_BATCH_COUNT in tlb_next_batch
          /* contend on lru_lock */

For a broader context, we've run decision support benchmarks where lru_lock (and zone->lock) show long wait times. But we're not the only ones according to certain kernel comments:

 * zone_lru_lock is heavily contended.  Some of the functions that
 * shrink the lists perform better by taking out a batch of pages
 * and working on them outside the LRU lock.
 * For pagecache intensive workloads, this function is the hottest
 * spot in the kernel (apart from copy_*_user functions).
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,

* zone->lock and the [pgdat->lru_lock] are two of the hottest locks in the kernel. * So add a wild amount of padding here to ensure that they fall into separate
 * cachelines. ...

Anyway, if you're seeing this lock in your workloads, I'm interested in hearing what you're running so we can get more real world data on this.

Reply via email to