On Tue, 2023-06-06 at 12:38 -0500, Benjamin Marzinski wrote:
> On Tue, Jun 06, 2023 at 04:32:13PM +0000, Martin Wilck wrote:
> > On Tue, 2023-06-06 at 10:54 -0500, Benjamin Marzinski wrote:
> > > On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote:
> > > > On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote:
> > > > > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski
> > 
> > > > >  
> > > > > Actually, after looking into this more, pushing those two
> > > > > functions
> > > > > together makes the logic more confusing. Plus
> > > > > select_path_group()
> > > > > is
> > > > > used by multiple other functions that don't need to check if
> > > > > the
> > > > > path
> > > > > groups are out of order.
> > > > 
> > > > Hm. Can it happen at all that select_path_group() returns
> > > > something
> > > > other than 1 but path_groups_in_order() returns true? 
> > > 
> > > Yes. It might even be the common case. Say a switch goes down and
> > > all
> > > the paths in the high priority pathgroup fail. The kernel will
> > > switch
> > > over to a lower priority pathgroup. As long as those paths work,
> > > it
> > > won't automatically switch back to the high priority pathgroup
> > > when
> > > we
> > > tell it that those failed paths have recovered. It's multipath's
> > > job
> > > to
> > > tell it when to proactively switch pathgroups. Since multipath
> > > doesn't
> > > update the priority of failed paths, the pathgroups should still
> > > look
> > > the same (unless you use group_by_prio and the path fails between
> > > checking the state and running the prioritizer, in which case you
> > > will
> > > likely get a PRIO_UNDEF and reconfigure the pathgroups, but
> > > that's
> > > the
> > > thing group_by_tpg is trying to resolve). 
> > 
> > Ok, this is subtle; it's caused by the fact that
> > path_groups_in_order()
> > ignores the ordering of PGs with pgp->prio = PRIO_UNDEF (which will
> > be
> > the prio of a PG with only failed paths), whereas
> > select_path_group()
> > will ignore such PGs it in a different way - by never selecting
> > them.
> > I hope I understand correctly now.
> > 
> > I have to say this is confusing. We have different concepts of how
> > path
> > priority and path state together affect the PG priority, and we
> > apply
> > these different concepts in different parts of the code. I'm not
> > saying
> > it's wrong, but at the moment I'm too confused to tell if it's
> > right.
> 
> It might make sense to have these be even more different.  Perhaps
> select_path_group should stay the same but sort_pathgroups() and
> path_groups_in_order() should just look at the priorities of the
> paths
> that have a non-PRIO_UNDEF priority and use the total number of paths
> for tie-breakers. This would mean that they would order the
> pathgroups
> how they should be when everything is working correctly. 

That makes sense, yes.

> That way if
> something forces the kernel to pick a new pathgroup (which is likely
> the
> failure of all the paths in the current pathgroup), it will switch to
> the pathgroup that, all things being equal, should be the best.
> Obviously, if there is another pathgroup of equal prioity with more
> usable paths, multipath can tell it to switch to that one, but I
> assume
> that even without group_by_prio, multiple pathgroups with the same
> priority will be uncommon. 

I suppose it can happen with with group_by_tpg. There may be one
optimized and multiple non-optimized PGs, for example.

> 
> In this situation, it would make sense to call select_path_group()
> after
> calling group_paths(), since the first pathgroup might not currently
> be
> the best pathgroup, if its paths are down. At any rate, it's not
> something to worry about in this patchset.

Right. What I'm worrying about is mostly whether I (or someone else)
will understand the rationale of all this a few years from now. It's ok
to use different logic when we sort the PGs and when we select the best
one, but it should be explained why, and how, we do it. So please add
some comments explaining it.

Thanks
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

Reply via email to