On Tue, Apr 05, 2011 at 05:14:53PM +0200, Rene Nussbaumer wrote:
> On Tue, Apr 5, 2011 at 4:58 PM, Iustin Pop <[email protected]> wrote:
> > On Tue, Mar 29, 2011 at 03:57:42PM +0200, René Nussbaumer wrote:
> >> As the code for failover for checking is almost identical it's an easy
> >> task to switch it over to the TLMigrateInstance. This allows us to
> >> fallback to failover if migrate fails prereq check for some reason.
> >>
> >> Please note that everything from LUInstanceFailover.Exec is taken over
> >> unchanged to TLMigrateInstance._ExecFailover, only with adaption to
> >> opcode fields and variable referencing, but not in logic. There still
> >> needs to go some effort into merging the logic with the migration (for
> >> example DRBD handling). But this should happen in a separate iteration.
> >
> > Mmm, thanks for doing this. Sorry for the slow review, the patch is a
> > bit long :)
>
> Agreed, don't worry.
>
> > Question (unrelated to the patch itself):
> >
> > - what happens if the instance is ADMIN_up but ERROR_down?
>
> So in this case we won't detect it until the very end of CheckPrereq
> where we check if a migration is possible by doing an RPC call which
> looks at the actual instance state on the node. If fallback is
> specified we switch to failover (and mention that) otherwise if
> fallback is now allowed we fail the operation as before: «Can't
> migrate, instance is offline»
Cool, thanks.
> > Also, I wonder if we shouldn't merge the failover logic in the migration
> > logic, as:
> >
> > - live migration
> > - non-live migration
> > - offline (=failover)
> >
> > But that's for later, as you say.
>
> Yeah I though about the same by going by the mode, I'll keep that in
> mind. But I want to merge the logic in a second step, I found it just
> too risky to also change the logic and then if it doesn't work as
> expected anymore you start wondering what's wrong.
Agreed.
> >> + if not instance.admin_up and not self.failover and self.fallback:
> >> + self.lu.LogInfo("Instance is marked down, fallback allowed,
> >> switching"
> >> + " to failover")
> >> + self.failover = True
> >
> > Hmm… This means we won't have back in the opcode a way to determine
> > (programatically) whether a migration or a failover was done, but only
> > by parsing the logs.
>
> Good point. Is this concerning us at the moment?
>From a practical point of view, not sure. From the style/design point of
view, we (I?) try to keep any changes in the operation compared to the
original opcode reflected back into the opcode, so that it can be
inspected later.
> What solutions would
> we have? Introducing another opcode?
Let's leave it aside for now. We first have to see how/if we merge
migration and failover at all.
thanks,
iustin