Faidon Liambotis has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/95963


Change subject: Replace Linux RPS setting with a smart mechanism
......................................................................

Replace Linux RPS setting with a smart mechanism

We were setting Receive Packet Steering as "ff" (i.e. all CPUs, up to 16
CPUs) to all RX queues for a given interface (eth0). While is this what
basically everyone on the web recommends, experience in the field has
shown that this was very unbalanced, with CPU 0 getting most of the load
and even spending all of its time being busy -and producing packet loss
& latency- while other CPUs remained relatively idle.

A smarter way of handling the load is pinning each queue to a separate
CPU, while isolating to the greatest extent the CPUs for each. This has
proven to balance the load as fairly as possible. Experimentation has
also shown that using HyperThreading siblings (even pairing them with
each other) is actively worse and it's best to ignore them completely

Finally, experimentation has shown that for our primary use case (LVS),
Transmit Packet Steering makes little to no difference.

This replaces the older bash one-liner with a Python script which tries
to be extra smart about how to distribute queues to CPUs (and CPUs with
HT). Unfortunately, the number of CPUs and queues can differ wildly
between boxes and -in cases such as amslvs1- can unconvienently be five
queues for four CPUs, so we need to cover multiple cases.

This has been tested manually on amslvs1 & lvs1001 and made a
considerable difference.

Change-Id: I606d222616a62563cb0a2939d3e800d04db1de39
---
M manifests/lvs.pp
D modules/generic/files/upstart/enable-rps.conf
A modules/interface/files/interface-rps.py
A modules/interface/manifests/rps.pp
A modules/interface/templates/enable-rps.conf.erb
5 files changed, 154 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/63/95963/1

diff --git a/manifests/lvs.pp b/manifests/lvs.pp
index a0f4628..6a20e86 100644
--- a/manifests/lvs.pp
+++ b/manifests/lvs.pp
@@ -950,7 +950,14 @@
                },
        }
 
-       generic::upstart_job { "enable-rps": install => "true", start => "true" 
}
+       interface::rps {
+               interface => 'eth0',
+       }
+
+       # XXX: old RPS mechanism; remove after a successful run; 2013-11-18
+       file { '/etc/init/enable-rps.conf':
+               ensure => absent,
+       }
 }
 
 # Supporting the PyBal RunCommand monitor
diff --git a/modules/generic/files/upstart/enable-rps.conf 
b/modules/generic/files/upstart/enable-rps.conf
deleted file mode 100644
index caed241..0000000
--- a/modules/generic/files/upstart/enable-rps.conf
+++ /dev/null
@@ -1,10 +0,0 @@
-# enable-rps
-
-description "Enable RPS on eth0 receive queues"
-author "Mark Bergsma <[email protected]>"
-
-start on filesystem
-task
-script
-       for queue in /sys/class/net/eth0/queues/rx-*; do echo ff > 
$queue/rps_cpus; done
-end script
diff --git a/modules/interface/files/interface-rps.py 
b/modules/interface/files/interface-rps.py
new file mode 100644
index 0000000..97aee59
--- /dev/null
+++ b/modules/interface/files/interface-rps.py
@@ -0,0 +1,109 @@
+#!/usr/bin/env python
+
+# Sets up Receive Packet Steering (RPS) for a given interface.
+#
+# Tries to allocate separate queues to separate CPUs, rather than follow
+# what's common advice out there (all CPUs to all queues), as experience has
+# shown a tremendous difference.
+#
+# Author: Faidon Liambotis
+# Copyright (c) 2013 Wikimedia Foundation, Inc.
+
+import os
+import glob
+import sys
+
+
+def get_value(path):
+    """Read a (sysfs) value from path"""
+    return open(path, 'r').read()[:-1]
+
+
+def write_value(path, value):
+    """Write a (sysfs) value to path"""
+    print '%s = %s' % (path, value)
+    open(path, 'w').write(value)
+
+
+def get_cpu_list():
+    """Get a list of all CPUs by their number (e.g. [0, 1, 2, 3])"""
+    path_cpu = '/sys/devices/system/cpu/'
+    cpu_nodes = glob.glob(os.path.join(path_cpu, 'cpu[0-9]*'))
+    cpus = [int(os.path.basename(c)[3:]) for c in cpu_nodes]
+
+    # filter-out HyperThreading siblings
+    cores = []
+    for cpu in cpus:
+        path_threads = os.path.join(path_cpu, 'cpu%s' % cpu,
+                                    'topology', 'thread_siblings_list')
+        thread_siblings = get_value(path_threads).split(',')
+        cores.append(int(thread_siblings[0]))
+
+    # return a (unique) sorted set of CPUs without their HT siblings
+    return sorted(set(cores))
+
+
+def get_rx_queues(device):
+    """Get a list of RX queues for device"""
+    rx_nodes = glob.glob(os.path.join('/sys/class/net', device, 'queues',
+                                      'rx-*'))
+    rx_queues = [int(os.path.basename(q)[3:]) for q in rx_nodes]
+
+    return rx_queues
+
+
+def assign_rx_queue_to_cpus(device, rx_queue, cpus):
+    """Assign a device's RX queue to a CPU set"""
+    bitmask = 0
+    for cpu in cpus:
+        bitmask += 2**cpu
+
+    rx_node = os.path.join('/sys/class/net', device, 'queues',
+                           'rx-%s' % rx_queue, 'rps_cpus')
+
+    write_value(rx_node, format(bitmask, 'x'))
+
+
+
+def distribute_rx_queues_to_cpus(device, rx_queues, cpu_list):
+    """Performs a smart distribution of RX queues to CPUs (or vice-versa)"""
+    if len(rx_queues) >= len(cpu_list):
+        # try to divide queues / CPUs and assign N CPUs per queue, isolated
+        (quot, rem) = divmod(len(rx_queues), len(cpu_list))
+
+        for i, cpu in enumerate(cpu_list):
+            for j in range(quot):
+                rxq = rx_queues[i*quot + j]
+                assign_rx_queue_to_cpus(device, rxq,  [cpu])
+
+        # if there are remainders, assign CPUs to them and hope for the best
+        if rem > 0:
+            for i, rxq in enumerate(rx_queues[-rem:]):
+                assign_rx_queue_to_cpus(device, rxq,  cpu_list)
+    else:
+        # do the opposite division
+        (quot, rem) = divmod(len(cpu_list), len(rx_queues))
+
+        #...and collect CPUs, then assign them together to queues
+        for i, rxq in enumerate(rx_queues):
+            cpus = []
+            for j in range(quot):
+                cpus.append(cpu_list[i*quot + j])
+            assign_rx_queue_to_cpus(device, rxq,  cpus)
+
+def main():
+    try:
+        device = sys.argv[1]
+    except IndexError:
+        device = 'eth0'
+
+    cpu_list = get_cpu_list()
+    rx_queues = get_rx_queues(device)
+
+    print cpu_list
+    print rx_queues
+    distribute_rx_queues_to_cpus(device, rx_queues, cpu_list)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/modules/interface/manifests/rps.pp 
b/modules/interface/manifests/rps.pp
new file mode 100644
index 0000000..40327a6
--- /dev/null
+++ b/modules/interface/manifests/rps.pp
@@ -0,0 +1,28 @@
+# Definition: interface::rps
+#
+# Automagically sets RPS for an interface
+#
+# Parameters:
+# - $interface:
+#   The network interface to operate on
+define interface::rps($interface='eth0') {
+    file { '/usr/local/sbin/interface-rps':
+        owner   => 'root',
+        group   => 'root',
+        mode    => '0555',
+        source  => 'puppet:///modules/interface/interface-rps.py',
+    }
+
+    file { "/etc/init/enable-rps-$interface.conf":
+        owner   => 'root',
+        group   => 'root',
+        mode    => '0444',
+        content => tempate('interface/enable-rps.conf.erb'),
+    }
+
+    exec { "interface-rps $interface":
+        command   => "/usr/local/sbin/interface-rps $interface",
+        subscribe => File["/etc/init/enable-rps-$interface.conf"],
+        require   => File["/etc/init/enable-rps-$interface.conf"],
+    }
+}
diff --git a/modules/interface/templates/enable-rps.conf.erb 
b/modules/interface/templates/enable-rps.conf.erb
new file mode 100644
index 0000000..9368149
--- /dev/null
+++ b/modules/interface/templates/enable-rps.conf.erb
@@ -0,0 +1,9 @@
+# enable-rps
+
+description "Enable RPS on <%= @interface %> RX queues"
+
+start on filesystem
+task
+script
+       interface-rps <%= @interface %>
+end script

-- 
To view, visit https://gerrit.wikimedia.org/r/95963
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I606d222616a62563cb0a2939d3e800d04db1de39
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to