On Thu, 31 Mar 2022 03:53:15 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

>> Yeah, maybe I misunderstood what you mean. So maybe the masked store 
>> `(store(src, m))` could be implemented with:
>> 
>> 1) v1 = load
>> 2) v2 = blend(load, src, m)
>> 3) store(v2)
>> 
>> Let's record this a JBS and fix it with a followed-up patch. Thanks!
>
> The optimization for masked store is recorded to: 
> https://bugs.openjdk.java.net/browse/JDK-8284050

> The blend should be with the intended-to-store vector, so that masked lanes 
> contain the need-to-store elements and unmasked lanes contain the loaded 
> elements, which would be stored back, which results in unchanged values.

It may not work if memory is beyond legal accessible address space of the 
process, a corner case could be a page boundary.  Thus re-composing the 
intermediated vector which partially contains actual updates but effectively 
perform full vector write to destination address may not work in all scenarios.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8035

Reply via email to