The associativity domain numbers are obtained from the hypervisor through
registers and written into memory by the guest: the packed array passed to
vphn_unpack_associativity() is then native-endian, unlike what was assumed
in the following commit:

commit b08a2a12e44eaec5024b2b969f4fcb98169d1ca3
Author: Alistair Popple <alist...@popple.id.au>
Date:   Wed Aug 7 02:01:44 2013 +1000

    powerpc: Make NUMA device node code endian safe

If a CPU home node changes, the topology gets filled with
bogus values. This leads to severe performance breakdowns.

This patch does two things:
- extract values from the packed array with shifts, in order to be endian
  neutral
- convert the resulting values to be32 as expected

Suggested-by: Anton Blanchard <an...@samba.org>
Signed-off-by: Greg Kurz <gk...@linux.vnet.ibm.com>
---

Changes in v2:
- removed the left out __be16 *field declaration
- removed the left out be16_to_cpup() call
- updated the comment of the magic formula

Thanks again Nish... the two left outs probably explain why PowerVM wasn't
happy that patch. :P

--
Greg

 arch/powerpc/mm/numa.c |   64 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b835bf0..4547f91 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1369,38 +1369,78 @@ static int update_cpu_associativity_changes_mask(void)
 #define VPHN_ASSOC_BUFSIZE (6*sizeof(u64)/sizeof(u32) + 1)
 
 /*
+ * The associativity values are either 16-bit (VPHN_FIELD_MSB) or 32-bit (data
+ * or VPHN_FIELD_UNUSED). We hence need to parse the packed array into 16-bit
+ * chunks. Let's do that with bit shifts to be endian neutral.
+ *
+ *    --- 16-bit chunks -->
+ *  _________________________
+ *  |  0  |  1  |  2  |  3  |   packed[0]
+ *  -------------------------
+ *  _________________________
+ *  |  4  |  5  |  6  |  7  |   packed[1]
+ *  -------------------------
+ *            ...
+ *  _________________________
+ *  | 20  | 21  | 22  | 23  |   packed[5]
+ *  -------------------------
+ *       48    32    16     0
+ *    <------ bits -------- 
+ *
+ * We need 48-bit shift for chunks 0,4,8,16,20
+ *         32-bit shift for chunks 1,5,9,17,21
+ *         16-bit shift for chunks 2,6,10,18,22
+ *             no shift for chunks 3,7,11,19,23
+ *
+ * The 2 lo bits of the chunk index multiplied by 16 give the shift.
+ * The remaining hi bits divided by 4 give the index in packed[].
+ *
+ * For example:
+ * chunk  0 = packed[0/4]  >> (~0b00000 & 0b11) * 16 = bits 48..63 in packed[0]
+ * chunk  5 = packed[5/4]  >> (~0b00101 & 0b11) * 16 = bits 32..47 in packed[1]
+ * chunk 22 = packed[22/4] >> (~0b10110 & 0b11) * 16 = bits 16..31 in packed[5]
+ */
+static inline u16 read_vphn_chunk(const long *packed, unsigned int i)
+{
+       return packed[i >> 2] >> ((~i & 3) << 4);
+}
+
+/*
  * Convert the associativity domain numbers returned from the hypervisor
  * to the sequence they would appear in the ibm,associativity property.
  */
 static int vphn_unpack_associativity(const long *packed, __be32 *unpacked)
 {
-       int i, nr_assoc_doms = 0;
-       const __be16 *field = (const __be16 *) packed;
+       unsigned int i, j, nr_assoc_doms = 0;
 
 #define VPHN_FIELD_UNUSED      (0xffff)
 #define VPHN_FIELD_MSB         (0x8000)
 #define VPHN_FIELD_MASK                (~VPHN_FIELD_MSB)
 
-       for (i = 1; i < VPHN_ASSOC_BUFSIZE; i++) {
-               if (be16_to_cpup(field) == VPHN_FIELD_UNUSED) {
+       for (i = 1, j = 0; i < VPHN_ASSOC_BUFSIZE; i++) {
+               u16 field = read_vphn_chunk(packed, j);
+
+               if (field == VPHN_FIELD_UNUSED) {
                        /* All significant fields processed, and remaining
                         * fields contain the reserved value of all 1's.
                         * Just store them.
                         */
-                       unpacked[i] = *((__be32 *)field);
-                       field += 2;
-               } else if (be16_to_cpup(field) & VPHN_FIELD_MSB) {
+                       unpacked[i] = (VPHN_FIELD_UNUSED << 16 |
+                                      VPHN_FIELD_UNUSED);
+                       j += 2;
+               } else if (field & VPHN_FIELD_MSB) {
                        /* Data is in the lower 15 bits of this field */
-                       unpacked[i] = cpu_to_be32(
-                               be16_to_cpup(field) & VPHN_FIELD_MASK);
-                       field++;
+                       unpacked[i] = cpu_to_be32(field & VPHN_FIELD_MASK);
+                       j++;
                        nr_assoc_doms++;
                } else {
                        /* Data is in the lower 15 bits of this field
                         * concatenated with the next 16 bit field
                         */
-                       unpacked[i] = *((__be32 *)field);
-                       field += 2;
+                       unpacked[i] =
+                               cpu_to_be32((u32) field << 16 |
+                                           read_vphn_chunk(packed, j + 1));
+                       j += 2;
                        nr_assoc_doms++;
                }
        }

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to