The bits PF_IO_WORKER and PF_WQ_WORKER are tested together in
sched_submit_work() which is considered to be a hot path.
If the two bits cross the 8 or 16 bit boundary then most architecture
require multiple load instructions in order to create the constant
value. Also, such a value can not be encoded within the compare opcode.

By moving the bit definition within the same block, the compiler can
create/use one immediate value.

For some reason gcc-10 on ARM64 requires both bits to be next to each
other in order to issue "tst reg, val; bne label". Otherwise the result
is "mov reg1, val; tst reg, reg1; bne label".

Move PF_VCPU out of the way so that PF_IO_WORKER can be next to
PF_WQ_WORKER.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---

Could someone from the ARM64 camp please verify if this a gcc "bug" or
opcode/arch limitation? With PF_IO_WORKER as 1 (without the PF_VCPU
swap) I get for ARM:

| tst     r2, #33 @ task_flags,
| beq     .L998           @,

however ARM64 does here:
| mov     w0, 33  // tmp117,
| tst     w19, w0 // task_flags, tmp117
| bne     .L453           //,

the extra mov operation. Moving PF_IO_WORKER next to PF_WQ_WORKER as
this patch gives me:
| tst     w19, 48 // task_flags,
| bne     .L453           //,


 include/linux/sched.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd930efd31..2bf0af19a62a4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1489,9 +1489,10 @@ extern struct pid *cad_pid;
 /*
  * Per process flags
  */
+#define PF_VCPU                        0x00000001      /* I'm a virtual CPU */
 #define PF_IDLE                        0x00000002      /* I am an IDLE thread 
*/
 #define PF_EXITING             0x00000004      /* Getting shut down */
-#define PF_VCPU                        0x00000010      /* I'm a virtual CPU */
+#define PF_IO_WORKER           0x00000010      /* Task is an IO worker */
 #define PF_WQ_WORKER           0x00000020      /* I'm a workqueue worker */
 #define PF_FORKNOEXEC          0x00000040      /* Forked but didn't exec */
 #define PF_MCE_PROCESS         0x00000080      /* Process policy on mce errors 
*/
@@ -1515,7 +1516,6 @@ extern struct pid *cad_pid;
 #define PF_NO_SETAFFINITY      0x04000000      /* Userland is not allowed to 
meddle with cpus_mask */
 #define PF_MCE_EARLY           0x08000000      /* Early kill for mce process 
policy */
 #define PF_MEMALLOC_NOCMA      0x10000000      /* All allocation request will 
have _GFP_MOVABLE cleared */
-#define PF_IO_WORKER           0x20000000      /* Task is an IO worker */
 #define PF_FREEZER_SKIP                0x40000000      /* Freezer should not 
count it as freezable */
 #define PF_SUSPEND_TASK                0x80000000      /* This thread called 
freeze_processes() and should not be frozen */
 
-- 
2.28.0

Reply via email to