On Tue, 23 Jun 2026 18:34:13 GMT, Sergey Bylokhov <[email protected]> wrote:

> Then maybe an existing benchmark can be update, to show some improvement or 
> the new one added?

As @ferakocz said the intrinsic is useful to ensure constant time execution in 
all cases. We don't actually need it to improve performance. We would only need 
to worry about it if it was responsible for a noticeable degradation in 
performance. 

@ferakocz says we don't see that in the tests he has performed and I think that 
means we are likely to face that situation with any other kernels that rely on 
conditionalAssign. I'm reassured by the fact that the Java version runs roughly 
twice as fast as the intrinsic. That matches the opportunity the provision of 
bytecode offers the compiler, as noted above, to halve the number of EOR 
instructions. Since this would not be an option in cases where set is not 
hard-wired I don't think we need to care about the result of this test.

@mrserb please feel free to create a new benchmark that reliably tests the Java 
code vs the intrinsic for the general case (i.e. where set cannot be derived or 
predicted) and if there is a true degradation on either x86 or aarch64 then we 
can consider whether or not to keep the intrinsic.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30941#issuecomment-4787118490

Reply via email to