On 2016-09-19 14:01, Robin Murphy wrote:
On 19/09/16 18:36, Brent DeGraaf wrote:
According to section 6.3.8 of the ARM Programmer's Guide, non-temporal
loads and stores do not verify that address dependency is met between a
load of an address to a register and a subsequent non-temporal load or
store using that address on the executing PE. Therefore, context switch code and subroutine calls that use non-temporally accessed addresses as
parameters that might depend on a load of an address into an argument
register must ensure that ordering requirements are met by introducing
a barrier prior to the successive non-temporal access. Add appropriate
barriers whereever this specific situation comes into play.

Signed-off-by: Brent DeGraaf <bdegr...@codeaurora.org>
 arch/arm64/kernel/entry.S  | 1 +
 arch/arm64/lib/copy_page.S | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 441420c..982c4d3 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -679,6 +679,7 @@ ENTRY(cpu_switch_to)
        ldp     x27, x28, [x8], #16
        ldp     x29, x9, [x8], #16
        ldr     lr, [x8]
+ dmb nshld // Existence of instructions with loose load-use dependencies (e.g. ldnp/stnp) make this barrier necessary
        mov     sp, x9
        and     x9, x9, #~(THREAD_SIZE - 1)
        msr     sp_el0, x9
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index 4c1e700..21c6892 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -47,6 +47,8 @@ alternative_endif
        ldp     x14, x15, [x1, #96]
        ldp     x16, x17, [x1, #112]

+       dmb     nshld // In case x0 (for stnp) is dependent on a load

The ARMv8 ARM (B2.7.2 in issue j) says that when an address dependency
exists between a load and a subsequent LDNP, *other* observers may
observe those accesses in any order. How's that related to an STNP on
the same CPU?


        mov     x18, #(PAGE_SIZE - 128)
        add     x1, x1, #128

Yes, I have seen the section in the ARM ARM about this. But the Programmer's Guide goes further, even providing a concrete example:

"Non-temporal loads and stores relax the memory ordering requirements...the LDNP instruction might be observed before the preceding LDR instruction, which can result in reading from an unpredictable
address in X0.

For example:
LDR X0, [X3]
LDNP X2, X1, [X0]
To correct the above, you need an explicit load barrier:
LDR X0, [X3]
LDNP X2, X1, [X0]"

Did the ARM ARM leave this out? Or is the Programmer's Guide section incorrect?

Thanks for your comments,

Reply via email to