On Tue, 24 Jun 2025 07:27:47 +0200 Christophe Leroy <christophe.le...@csgroup.eu> wrote:
> Le 22/06/2025 à 18:20, David Laight a écrit : > > On Sun, 22 Jun 2025 11:52:38 +0200 > > Christophe Leroy <christophe.le...@csgroup.eu> wrote: > > > >> Masked user access avoids the address/size verification by access_ok(). > >> Allthough its main purpose is to skip the speculation in the > >> verification of user address and size hence avoid the need of spec > >> mitigation, it also has the advantage to reduce the amount of > >> instructions needed so it also benefits to platforms that don't > >> need speculation mitigation, especially when the size of the copy is > >> not know at build time. > > > > It also removes a conditional branch that is quite likely to be > > statically predicted 'the wrong way'. > > But include/asm-generic/access_ok.h defines access_ok() as: > > #define access_ok(addr, size) likely(__access_ok(addr, size)) > > So GCC uses the 'unlikely' variant of the branch instruction to force > the correct prediction, doesn't it ? Nope... Most architectures don't have likely/unlikely variants of branches. So all gcc can do is decide which path is the fall-through and whether the branch is forwards or backwards. Additionally unless there is code in both the 'if' and 'else' clauses the [un]likely seems to have no effect. So on simple cpu that predict 'backwards branches taken' you can get the desired effect - but it may need an 'asm comment' to force the compiler to generate the required branches (eg forwards branch directly to a backwards unconditional jump). On x86 it is all more complicated. I think the pre-fetch code is likely to assume 'not taken' (but might use stale info on the cache line). The predictor itself never does 'static prediction' - it is always based on the referenced branch prediction data structure. So, unless you are in a loop (eg running a benchmark!) there is pretty much a 50% chance of a branch mispredict. I've been trying to benchmark different versions of the u64 * u64 / u64 function - and I think mispredicted branches make a big difference. I need to sit down and sequence the test cases so that I can see the effect of each branch! > > > > >> Unlike x86_64 which masks the address to 'all bits set' when the > >> user address is invalid, here the address is set to an address in > >> the gap. It avoids relying on the zero page to catch offseted > >> accesses. On book3s/32 it makes sure the opening remains on user > >> segment. The overcost is a single instruction in the masking. > > > > That isn't true (any more). > > Linus changed the check to (approx): > > if (uaddr > TASK_SIZE) > > uaddr = TASK_SIZE; > > (Implemented with a conditional move) > > Ah ok, I overlooked that, I didn't know the cmove instruction, seem > similar to the isel instruction on powerpc e500. It got added for the 386 - I learnt 8086 :-) I suspect x86 got there first... Although called 'conditional move' I very much suspect the write is actually unconditional. So the hardware implementation is much the same as 'add carry' except the ALU operation is a simple multiplex. Which means it is unlikely to be speculative. David > > Christophe >