On Thu, 4 Sep 2025 06:26:36 GMT, Hannes Greule <hgre...@openjdk.org> wrote:
>> This patch optimizes PopCount value transforms using KnownBits information. >> Following are the results of the micro-benchmark included with the patch >> >> >> >> System: 13th Gen Intel(R) Core(TM) i3-1315U >> >> Baseline: >> Benchmark Mode Cnt Score Error >> Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 215460.670 >> ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 294014.826 >> ops/s >> >> Withopt: >> Benchmark Mode Cnt Score Error >> Units >> PopCountValueTransform.LogicFoldingKerenLong thrpt 2 389978.082 >> ops/s >> PopCountValueTransform.LogicFoldingKerenlInt thrpt 2 417261.583 >> ops/s >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > The change looks good, but I wonder: > > - if it makes sense to have some kind of IR tests (i.e., it's folded away > when unneeded, when the input is a constant, ...)? > - whether the explanation could be simplified: Assuming a correct > implementation of the KnownBits canonicalization, we can argue > - `_zeroes` has the bits set that are known to be always 0. So > `BitsPer<Type> - popCount(x)` gives you an upper limit of how many bits > *might* be 1. And `BitsPer<Type> - popCount(_zeroes)` is equivalent to > `popCount(~_zeroes)`. > - `_ones` has the bits set that are known to be always 1. Trivially, > `popCount(_ones)` is a valid lower bound. > - The rest repeats how `adjust_bits_from_unsigned_bounds` works, but > that's not specific to the popcount nodes. Hi @SirYwell , @chhagedorn , @eme64 , I have addressed your comments. Let me know if this is good to land in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27075#issuecomment-3334870778