A few extra thoughts on this, since a lot of it is still based on my
design from nearly 5 years ago ;)

On Wed, 2017-09-20 at 17:27 -0700, 'Konstantin Orekhov' via Foreman
users wrote:
> 
> Hmm, one generic question on this - according to above logic, if my
> managed host had crashed, say because it lost its HW RAID controller,
> for example, so it can't boot off the disk anymore thus resulting in
> PXE boot (given that BIOS boot order is set that way), correct?

> Now, by default, Foreman default pxeconfig file makes a system to
> boot off its disk, which in this particular situation will result in
> endless loop until some external (to Foreman) monitoring detects a
> system failure, then a human gets on a console and real
> troubleshooting starts only then.

This is absolutely true. We had, at one time, considered adding a state
machine (or similar) to Foreman, so that such things (as well as boot
loops in Kickstart, and so forth) could be detected, but it was never
completed.

> Now, with that in mind, I was thinking of moving actual OS
> provisioning tasks to Foreman as well. However, if crashed system
> would never be allowed to re-register (get discovered) because it is
> already managed by Foreman, the above flow is just not going to work
> anymore and I'd have re-think all flows. Are there specific reasons
> why this in place? I understand that this is how it is implemented
> now, but is there a bigger idea behind that? If so, what is it?

There were two goals - to prevent duplicates (if unprovisioned hosts
are rebooted, for example), and to allow recycling (delete a host from
Foreman, reboot it, and it'll be back in the discovered hosts list to
be re-used). Neither of these is insurmountable some other way, but
this was the easiest.

> Also, if you take my example of flows stitching for a complete system
> lifecycle management, what would you suggest we could do differently
> to allow Foreman to be a system that we use for both discovery and OS
> provisioning?

As Lukas says, a full refactor may well happen, and we'd love input on
that as we go forward. For a workaround today, I'd probably lean
towards a secondary plugin that sits on top of Discovery and interacts
with the registration process - given your example, you could add a
check if the regitraion matches a host that's already provisioned, and
take further action if so. That might also be a good way to proof-of-
concept some ideas, before merging the code back into Discovery. 

> Another thing (not as generic as above, but actually very applicable
> to my current issue) - if a client system is not allowed to register
> and given 422 error, for example, it keeps trying to register
> resulting in huge amount of work. This is also a gap, IMHO -
> discovery plug-in needs to do this differently somehow so rejected
> systems do not take away Foreman resources (see below for actual
> numbers of such attempts in one of my cluster).

I think I agree - the hosts should keep retrying until they get a
response from Foreman, but then actions can be taken. I'd probably be
in favour of keeping the retry (so that, say, if the offending MAC is
removed in Foreman, the host can register on the next retry), but
perhaps split the process into two calls. The first is a light "am I
registered?" call that returns true/false, and only if false would the
heavier registration call be made. Does that work?

Thanks!
Greg

-- 
You received this message because you are subscribed to the Google Groups 
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-users+unsubscr...@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at https://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to