On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja <[email protected]> wrote:
>> Yes, IF it is vectorized, then there is no difference between high and low
>> density. My concern was more if vectorization is preferrable over the scalar
>> alternative in the low-density case, where branch prediction is more stable.
>
> At runtime we do need to scan entire mask to pick the compressible lane
> corresponding to set mask bit. Thus the loop overhead of mask compare (BTW
> masks are held in a vector register for AVX2 targets) and jump will anyways
> be incurred , in addition for sparsely populated mask we may incur additional
> misprediction penalty for not taking if block which extracts an element from
> appropriate source vector lane and insert into destination vector lane.
> Overall vector solution will win for most common cases for varying mask and
> also for very sparsely populate masks. Here is the result of setting just a
> single mask bit.
>
>
> @Benchmark
> public void fuzzyFilterIntColumn() {
> int i = 0;
> int j = 0;
> long maskctr = 1;
> int endIndex = ispecies.loopBound(size);
> for (; i < endIndex; i += ispecies.length()) {
> IntVector vec = IntVector.fromArray(ispecies, intinCol, i);
> VectorMask<Integer> pred = VectorMask.fromLong(ispecies, 1);
> vec.compress(pred).intoArray(intoutCol, j);
> j += pred.trueCount();
> }
> }
>
>
> Baseline:
> Benchmark (size) Mode
> Cnt Score Error Units
> ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 379.059
> ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 188.355
> ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 95.315
> ops/ms
>
>
> Withopt:
> Benchmark (size) Mode
> Cnt Score Error Units
> ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 7390.074
> ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 3483.247
> ops/ms
> ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 1823.817
> ops/ms
Nice, thanks for the data!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446138902