You could have a periodic check in crontab or similar for "if node is offline and list of jobs running on node is empty and [file exists in some magic place], reboot", have the file be removed on startup, and then send a job to create the file with the special high-priority user, then offline the node.
It'll eventually reboot after being offlined and being out of jobs, so you drop human intervention without having to do much else. - Rich On Mon, Nov 15, 2010 at 10:15 AM, Arnau Bria <[email protected]> wrote: > On Mon, 15 Nov 2010 09:03:22 -0600 > Charles Johnson wrote: > >> On Nov 15, 2010, at 8:47 AM, Arnau Bria wrote: > Hi Charles, > > >> > At some time we'd like to send a kind of job that reboots the host. >> > But before rebooting the host we'd like to "drain" the node and >> > don't lose any job while rebooting. >> >> >> Why not just mark the node off-line, and when the jobs are finished >> reboot the node? > > That's our current procedure. > > But, with the reboot scenario I previously described before, we could > eliminate human intervention on reboot and checking node "drain". > > *I did not explain, but nodes went online/offline when rebooting > automatically by job and local rc.local file. > So it's interesting for us that a reboot (for kernel update, i.e) could > be done by sending as many jobs as nodes we have. > > >> ~Charles~ > Many thanks for your replies, > > Cheers, > Arnau > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
