A few extra thoughts on this, since a lot of it is still based on my design from nearly 5 years ago ;)
On Wed, 2017-09-20 at 17:27 -0700, 'Konstantin Orekhov' via Foreman users wrote: > > Hmm, one generic question on this - according to above logic, if my > managed host had crashed, say because it lost its HW RAID controller, > for example, so it can't boot off the disk anymore thus resulting in > PXE boot (given that BIOS boot order is set that way), correct? > Now, by default, Foreman default pxeconfig file makes a system to > boot off its disk, which in this particular situation will result in > endless loop until some external (to Foreman) monitoring detects a > system failure, then a human gets on a console and real > troubleshooting starts only then. This is absolutely true. We had, at one time, considered adding a state machine (or similar) to Foreman, so that such things (as well as boot loops in Kickstart, and so forth) could be detected, but it was never completed. > Now, with that in mind, I was thinking of moving actual OS > provisioning tasks to Foreman as well. However, if crashed system > would never be allowed to re-register (get discovered) because it is > already managed by Foreman, the above flow is just not going to work > anymore and I'd have re-think all flows. Are there specific reasons > why this in place? I understand that this is how it is implemented > now, but is there a bigger idea behind that? If so, what is it? There were two goals - to prevent duplicates (if unprovisioned hosts are rebooted, for example), and to allow recycling (delete a host from Foreman, reboot it, and it'll be back in the discovered hosts list to be re-used). Neither of these is insurmountable some other way, but this was the easiest. > Also, if you take my example of flows stitching for a complete system > lifecycle management, what would you suggest we could do differently > to allow Foreman to be a system that we use for both discovery and OS > provisioning? As Lukas says, a full refactor may well happen, and we'd love input on that as we go forward. For a workaround today, I'd probably lean towards a secondary plugin that sits on top of Discovery and interacts with the registration process - given your example, you could add a check if the regitraion matches a host that's already provisioned, and take further action if so. That might also be a good way to proof-of- concept some ideas, before merging the code back into Discovery. > Another thing (not as generic as above, but actually very applicable > to my current issue) - if a client system is not allowed to register > and given 422 error, for example, it keeps trying to register > resulting in huge amount of work. This is also a gap, IMHO - > discovery plug-in needs to do this differently somehow so rejected > systems do not take away Foreman resources (see below for actual > numbers of such attempts in one of my cluster). I think I agree - the hosts should keep retrying until they get a response from Foreman, but then actions can be taken. I'd probably be in favour of keeping the retry (so that, say, if the offending MAC is removed in Foreman, the host can register on the next retry), but perhaps split the process into two calls. The first is a light "am I registered?" call that returns true/false, and only if false would the heavier registration call be made. Does that work? Thanks! Greg -- You received this message because you are subscribed to the Google Groups "Foreman users" group. To unsubscribe from this group and stop receiving emails from it, send an email to foreman-users+unsubscr...@googlegroups.com. To post to this group, send email to foreman-users@googlegroups.com. Visit this group at https://groups.google.com/group/foreman-users. For more options, visit https://groups.google.com/d/optout.