https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #17 from Jan Hubicka <hubicka at gcc dot gnu.org> --- We already have /* This function adjusts the unroll factor based on the hardware capabilities. For ex, bdver3 has a loop buffer which makes unrolling of smaller loops less important. This function decides the unroll factor using number of memory references (value 32 is used) as a heuristic. */ static unsigned ix86_loop_unroll_adjust (unsigned nunroll, struct loop *loop) which triggers with TARGET_ADJUST_UNROLL /* X86_TUNE_ADJUST_UNROLL: This enables adjusting the unroll factor based on hardware capabilities. Bdver3 hardware has a loop buffer which makes unrolling small loop less important. For, such architectures we adjust the unroll factor so that the unrolled loop fits the loop buffer. */ DEF_TUNE (X86_TUNE_ADJUST_UNROLL, "adjust_unroll_factor", m_BDVER3 | m_BDVER4) so perhaps what you propose can be done by making this one more general?