On 9/23/25 8:22 AM, Martin Wilck wrote:
On Mon, 2025-09-22 at 20:38 +0200, Martin Wilck wrote:
Not necessarily. path_group_prio_update() calculates the average of
the
path priorities in the group. With ALUA and groups of 6+ paths, an
optimized group with one healthy path will have a lower prio (8) than
a
non-optimized group with all healthy paths (10). In such a case it
could happen that multipathd switches to the non-optimized group and
never switches back.
Sorry, this example was incorrect. Only paths in UP or GHOST states are
counted in for the PG prio. In this example the optimized group would
still have p = 50, and what I described would not occur.
The example would be correct if the 5 non-"healthy" paths in the first
PG were in standby aka GHOST state (resulting in p = 55 / 6 = 9). But
that's a very different scenario, and highly theoretical.
I would suggest setting FAILBACK_IMMEDIATE instead. It's well
documented that FOLLOWOVER is only for cluster environments.
Despite the wrong example, I still think FAILBACK_IMMEDIATE makes
more sense as a general default. We can add a comment about the vendor
recommendations.
I sent a new patch with that fixed.