> Hi All,
> 
> This patch optimizes sub-word gather operation for x86 targets with AVX2 and 
> AVX512 features.
> 
> Following is the summary of changes:-
> 
> 1) Intrinsify sub-word gather using hybrid algorithm which initially 
> partially unrolls scalar loop to accumulates values from gather indices into 
> a quadword(64bit) slice followed by vector permutation to place the slice 
> into appropriate vector lanes, it prevents code bloating and generates compact
> JIT sequence. This coupled with savings from expansive array allocation in 
> existing java implementation translates into significant performance of 
> 1.3-5x gains with included micro.
> 
> 
> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d)
> 
> 
> 2) Patch was also compared against modified java fallback implementation by 
> replacing temporary array allocation with zero initialized vector and a 
> scalar loops which inserts gathered values into vector. But, vector insert 
> operation in higher vector lanes is a three step process which first extracts 
> the upper vector 128 bit lane, updates it with gather subword value and then 
> inserts the lane back to its original position. This makes inserts into 
> higher order lanes costly w.r.t to proposed solution. In addition generated 
> JIT code for modified fallback implementation was very bulky. This may impact 
> in-lining decisions into caller contexts.
> 
> 3) Some minor adjustments in existing gather instruction pattens for 
> double/quad words.
> 
> 
> Kindly review and share your feedback.
> 
> 
> Best Regards,
> Jatin

Jatin Bhateja has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains ten additional 
commits since the last revision:

 - Refined AVX3 implementation with integral gather.
 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650
 - Fix incorrect comment
 - Review comments resolutions.
 - Review comments resolutions.
 - Review comments resolutions.
 - Restricting masked sub-word gather to AVX512 target to align with integral 
gather support.
 - Review comments resolution.
 - 8318650: Optimized subword gather for x86 targets.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/16354/files
  - new: https://git.openjdk.org/jdk/pull/16354/files/328b2217..a6f0f8cf

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=06-07

  Stats: 842039 lines in 5288 files changed: 204894 ins; 551932 del; 85213 mod
  Patch: https://git.openjdk.org/jdk/pull/16354.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354

PR: https://git.openjdk.org/jdk/pull/16354

Reply via email to