On Sun, 7 Apr 2024 05:14:08 GMT, Francesco Nigro <d...@openjdk.org> wrote:

>> I went ahead and tried a pure-Java implementation, and it is faster for 
>> small sizes (up to 8) and only about 1.5x slower for larger sizes, so that 
>> might make for an interesting fallback if there is no customized assembler 
>> implementation available or if the size is known to me small.
>> 
>> Ideally, I think we would want C2 to be more aware of setMemory stores, so 
>> that it can remove redundant stores, like it does with InitializeNode.
>
> @dean-long in my old PR I have done the same, choosing a (not yet) 
> configurable cutoff value. 
> 
> See https://github.com/openjdk/jdk/pull/16760

As an experiment I added the java code that @franz1981 supplied and ran 
performance vs. the intrinsic stub.  I used 128 bytes as the cutoff value as in 
that code.  I saw about 0.75 to 1ns improvement for sizes of 1 or 2 bytes only. 
 Anything larger and the stub performed better.

@mcimadamore Is there any way to disable some of the optimizations C2 will 
attempt on the IR?  We need to maintain atomicity, so vectorization shouldn't 
occur, for instance.  This seems like a rat-hole that would need constant 
maintenance as C2 optimizations get better.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2046208254

Reply via email to