Ok,

After studying a little bit more the problem, I have a question that I have no 
answer for.

Why have si_netbootmond been created while a better place to add this 
monitoring would have been the monitoring source: si_monitor.

Indeed, si_monitor is aware in real time of the "imaged" status and this, 
whatever the deployment solution is (bittorrent, rsyncd, flamethrower, ...)

In my precedent post, I though that looking for status 102 (rebooted) was the 
solution, unfortunately, if client is set to netboot, a rinstall loop will 
occure before the rebooted status had a chance to be sent to si_monitor.

So, I think that a rock solid solution would be to have the following 
algoryhtm set in si_monitor.

have a si_monitor configuration parameter to enable or disable the netbootmond 
feature.
if enabled, when receiving "imaged"  (status 100) (or maybe finalizing (101) or 
even rebooting (104), and if NET_BOOT_DEFAULT is set to local, then it would 
run a:
si_mkclientnetboot --localboot --clients "<the client>" and optionally start a 
timer. (configurable).

When timer expires, it would check the /var/lib/systemimager/clients.xml for 
that client, and if "rebooted (102) status is not there, then assumes that 
reboot failed (bad boot loader, wrong fstab, garbage from postinstall script, 
...) and revert the client to netboot so another imaging attempt can occure.

I admit that the netboot disabling could lead to endless loop doing failed 
reimaging, but we could think about putting a no timout netboot waiting for a 
keypress or a poweroff netboot or any other "on-fail-to-reboot" configurable 
behavior.

Anyway, aside the "timer" option, the main purpose here is: why not 
integrating si_netbootmond into si_monitor as we can benefit of real-time 
client status without active waiting and we are independant of the deploying 
method. (and it is also not that difficult to code mainly at line 
si_monitor:390)

if(($client->{'status'} == 100) && $netbootmond_option_enabled) {
    system("si_mkclientnetboot --localboot --clients \"$client->{'name'}\"");
}

What do you think?

Best regards,

Olivier.

Le mercredi 23 juillet 2014 16:37:52 Olivier LAHAYE a écrit :
> Dear all,
> 
> I'm working on si_netbootmond (in fact working on OSCAR side) and discovered
> that si_netbootmond rely on the fact that the imaging method is rsync.
> It monitors the rsyncd log file for magic works stating that imaging is
> complete.
> 
> There are 2 problems here:
> 1/ It doesn't work for deployment using flamethrower / multicast or
> bittorrent 2/ If a client is successfully imaged, this doesn't mean that it
> is able to reboot (bad image, bad bootloader, ...) in this situation,
> setting local boot is wrong as we certainly want to boot from net again in
> order to re-image.
> 
> I see 3 solutions:
> 1/ replace the actual code that scans /var/log/systemimager/rsyncd for magic
> words: /scripts\/imaging_complete_?([\.0-9]+)?/ with something that would
> do something like:
> 
> use File::Monitor;
> my $monitor = File::Monitor->new();
> $monitor->watch('/var/lib/systemimager/clients.xml');
> In a loop, searching for clients with status 102 (REBOOTED)
> and run:
> si_mkclientnetboot --localboot --clients "<the client>"
> 
> 2/ Keep actual code and add an option to monitor client.xml instead
> 
> 3/ drop or left si_netbootmond untouched and add an option to unable
> updating the netboot to local in si_monitor. Indeed, si_monitor won't
> require an active checking of a file. it is in blocking state on a socket
> listenning for infos. if it receives message rebooted, then it could call
> a:
> si_mkclientnetboot --localboot --clients "<the client>"
> 
> Technically, solution 3 has the advantage of being passive while methode 1
> and 2 are active listenning of node status change. Problem of solution 3 is
> that I don't know if it's logic to have the hability to update netboot of
> client in si_monitor, and  I if si_monitor is not started, then no
> possibility to update netboot.
> 
> What is the best solution?
> 
> Regards,
> 
> Olivier.


Cordialement,

Olivier.

-- 
    Olivier Lahaye
    DRT/LIST/DIR

------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
sisuite-devel mailing list
sisuite-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sisuite-devel

Reply via email to