I typically use out-of-band ipmitool to reboot machines that might once in a while be unreachable via ssh remotely because something went wrong.
Earlier today I was running a new, challenging parallel job over 32 servers and something went wrong. I suspect the nodes ran out of memory and after that a bunch of nodes became unresponsive. On some I was able to use impitool to reboot: /usr/bin/ipmitool -f ~/ipmi_pw -I lanplus -U root -H 172.16.0.13 power cycle But a bunch of them the BMC just doesn't respond to ipmitool? /usr/bin/ipmitool -f ~/ipmi_pw -I lanplus -U root -H 172.16.0.13 power status just hangs. Isn't this the whole point behind the BMC the ability to be able to connect and recover out-of-band? Even if (worst case) my kernel panicked and the tcp stack collapsed shouldn't ipmi still be able to talk to the BMC? I had checked that before the job crashed the nodes the BMCs were working and responsive to IPMI. Am I doing something wrong here or is this non-robustness a documented shortcoming of the BMC's? Any comments for others using BMC's / IPMI are very welcome! Verbose: no clues More verbose: IPMI LAN host 172.16.0.14 port 623 >> Sending IPMI command payload >> netfn : 0x06 >> command : 0x38 >> data : 0x8e 0x04 >> Sending IPMI command payload >> netfn : 0x06 >> command : 0x38 >> data : 0x8e 0x04 Most verbose: IPMI LAN host 172.16.0.14 port 623 >> Sending IPMI command payload >> netfn : 0x06 >> command : 0x38 >> data : 0x8e 0x04 BUILDING A v1.5 COMMAND >> IPMI Request Session Header >> Authtype : NONE >> Sequence : 0x00000000 >> Session ID : 0x00000000 >> IPMI Request Message Header >> Rs Addr : 20 >> NetFn : 06 >> Rs LUN : 0 >> Rq Addr : 81 >> Rq Seq : 00 >> Rq Lun : 0 >> Command : 38 Are there any more options I can pass to ipmitool to increase verbosity and know at what exact point the hang is occuring? -- Rahul ------------------------------------------------------------------------------ Sell apps to millions through the Intel(R) Atom(Tm) Developer Program Be part of this innovative community and reach millions of netbook users worldwide. Take advantage of special opportunities to increase revenue and speed time-to-market. Join now, and jumpstart your future. http://p.sf.net/sfu/intel-atom-d2d _______________________________________________ Ipmitool-devel mailing list Ipmitool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ipmitool-devel