On Tue, Jun 24, 2025 at 5:22 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > Ideall we should catch repeated constants more generally since > > > > this appears elsewhere too. > > > > I am not quite sure where to fit it best. We already have a > > > > machine specific task that loads 0 into SSE register which is kind > > > > of similar to this as well. > > > > 3) Figure out what are reasonable MOVE_RATIO/CLEAR_RATIO defaults > > > > 4) Possibly go with the entry point idea? > Considering the test results on microbenchmark and actual workroads, > the increase in codesize is not much (potentially not much impact on > icache and processor front-end), and in practice both scalar move and > SSE move are better than rep stos (size less than a specific > constant). So maybe we'll just adopt H.J's new patch. > Any thoughts?
I'd just mention that the linux kernel doesn't use SSE strategies, so perhaps some benchmark with -mno-sse would be beneficial to set optimal thresholds also for -mno-sse. Uros.