On Tue, Jun 24, 2025 at 5:22 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > >       Ideall we should catch repeated constants more generally since
> > > >       this appears elsewhere too.
> > > >       I am not quite sure where to fit it best.  We already have a
> > > >       machine specific task that loads 0 into SSE register which is kind
> > > >       of similar to this as well.
> > > >   3) Figure out what are reasonable MOVE_RATIO/CLEAR_RATIO defaults
> > > >   4) Possibly go with the entry point idea?
> Considering the test results on microbenchmark and actual workroads,
> the increase in codesize is not much (potentially not much impact on
> icache and processor front-end), and in practice both scalar move and
> SSE move are better than rep stos (size less than a specific
> constant). So maybe we'll just adopt H.J's new patch.
> Any thoughts?

I'd just mention that the linux kernel doesn't use SSE strategies, so
perhaps some benchmark with -mno-sse would be beneficial to set
optimal thresholds also for -mno-sse.

Uros.

Reply via email to