Hey Brian, Actually, I thought of a fix that was relatively tiny and easy. It won't require a workaround on the command line. I got it in this branch.
svn co svn://svn.sv.gnu.org/freeipmi/branches/dellsolinstancecapacity after checking out, the normal ./autogen.sh; ./configure; make then ipmiconsole/ipmiconsole -h host -u user -p pass ... as before. PLMK how it works for you. BTW, for documentation purchases, what motherboard are you seeing this issue on. Al On Wed, 2012-01-11 at 10:54 -0800, Albert Chu wrote: > Hey Brian, > > On Tue, 2012-01-10 at 20:14 -0800, Brian Lambert wrote: > > I did another test, and have attached debug output. > > > > First, I rebooted the BMC (Dell iDRAC6) to make sure there were no > > sessions active. > > > > I then established an initial SOL session, using the following command: > > ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize > > --serial-keepalive > > > > So far, so good. > > > > Instead of killing the first session, I left it active and tried to start > > a second session using the same command. That failed as expected, with a > > "BMC Error" message. Debug output from that first reconnect attempt is > > attached in ipmiconsole-reconnect1.txt. > > Ok, I see the problem. > > n003-bmc: ===================================================== > n003-bmc: IPMI 2.0 Get Payload Activation Status Response > n003-bmc: ===================================================== > <snip> > n003-bmc: IPMI Command Data: > n003-bmc: ------------------ > n003-bmc: [ 4Ah] = cmd[ 8b] > n003-bmc: [ 0h] = comp_code[ 8b] > n003-bmc: [ 0h] = instance_capacity[ 4b] > n003-bmc: [ 1h] = reserved[ 4b] > n003-bmc: [ 1h] = instance_1[ 1b] > n003-bmc: [ 0h] = instance_2[ 1b] > n003-bmc: [ 0h] = instance_3[ 1b] > n003-bmc: [ 0h] = instance_4[ 1b] > > the bug in Dell's implementation is the "0h = instance_capacity". This > indicates the number of SOL instances that can be done at the same time. > The fact that I ignore that it's 0 is a bug on my part (it should be > 0 > always if SOL can be done). > > This is then used iterate on instance_1, instance_2, etc. to determine > if SOL is currently activated. The 1h = instance_1 indicates that SOL > is active. But because instance_capacity is 0, I never look at it, so > the calculation is that no SOL is currently active. ipmiconsole > attempts to activate a SOL session as always, but b/c an SOL session is > already active, the activation fails, so it trys again (assuming someone > else raced with libipmiconsole and took SOL before it could). It checks > again to see if SOL is active, notices it's not, tries to activate > again, fails, and now we have a loop. Eventually there are too many > failed activation attempts and libipmiconsole errors out. > > > I then tried to deactivate the existing session using the command: > > ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize > > --serial-keepalive --deactivate > > > > That command completed without error, but the original session was still > > active and responding to keystrokes. Debug output from that attempt is > > attached in ipmiconsole-deactivate1.txt. > > Now this one makes sense. Given the above knowledge, libipmiconsole > calculates that the SOL session is already deactivated, so it never > attempts an actual SOL deactivation. > > I think this is very workaroundable, although I need to think about how > to do it (via workaround option? without?) and how I can be > careful/safe with it and not break other systems. I'll let ya know when > I have something you can try and tell ya the branch it's on. > > Al > > > I then tried to activate a new session a second time. It failed with the > > same error message as the first reconnect attempt. Debug output from the > > second attempt is in ipmiconsole-reconnect2.txt. > > > > Thanks for your help. Let me know if you need further details or want me > > to try anything else. > > > > thanks, > > Brian > > > > > > On Sun, 8 Jan 2012, Al Chu wrote: > > > > > Hi Brian, > > > > > > I've moved the IPMI portion of this thread to freeipmi-devel, since it's > > > a bit more appropriate for this mailing list. > > > > > >> To start a session, I can use the following FreeIPMI command: > > >> ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial- > > >> keepalive > > >> > > >> I can quit out of that session using the &. escape sequence, and > > >> reconnect right away. But if I 'kill -9' that process, I get a > > >> "[error received]: BMC Error" message when I try to connect with > > >> another ipmiconsole command. > > > > > > This indicates an unexpected error code along the way. ipmiconsole > > > probably noticed that the previous SOL session was activated and tried > > > to deactivate it, with some error occurring at some point. Could you > > > send the --debug output of ipmiconsole when you try to reconnnect? > > > > > >> This is the same error message I get > > >> when trying the connect when another session is already active. If I > > >> then issue the command: > > >> ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial- > > >> keepalive --deactivate > > >> This completes without error, but I still can't reconnect to the > > >> serial console. > > > > > > Can you give me the --debug output of the later connect attempt? I'd > > > like to see why it can't connect again. > > > > > >> I get similar results when using ipmitool. In that case, when I try > > >> to reconnect, I get: > > >> #ipmitool -U root -P calvin -H n003-bmc -I lanplus sol activate > > >> Info: SOL payload already active on another session > > >> > > >> If I try to deactivate the existing session, I get: > > >> # ipmitool -U root -P calvin -H n003-bmc -I lanplus sol deactivate > > >> Info: SOL payload already de-activated > > > > > > I don't know the exact test situation you're trying, but you could be > > > racing a bit in some of these scenarios. When you kill the previous > > > session with "kill -9", the server/BMC does not immediately end the > > > IPMI/SOL session. It lasts for awhile longer until the server/BMC > > > eventually times out. So that can explain why your first activate > > > attempt indicates the session is already activated, but it's deactivated > > > by the time your try to deactivate. > > > > > >> Once it's in this state, the only thing I've been able to do to regain > > >> access to the serial console is reboot the BMC or wait for the session > > >> to time out. > > >> > > >> I have the same experience when connecting to Dell iDRAC5 and iDRAC6, > > >> both running the latest firmware. Al, if you'd like more information > > >> or debug output from the freeipmi tools I'd be happy to provide it. > > > > > > Would like to get to the bottom of this. > > > > > > Al > > > > > > > > > On Sun, 2012-01-08 at 20:13 -0800, lambert wrote: > > >> After some additional experimentation, it looks like a direct ssh to > > >> the Dell blade iDRAC (BMC) followed by a command to activate the > > >> serial connection may be the way to go with these. I found that a > > >> SIGKILL to the ssh session was sufficient to close the serial console > > >> session, such that I could start another session with out needing to > > >> wait several minutes for the old session to time out. > > >> > > >> I still need to do some more testing, but Chris you may want to wait > > >> before you spend too much time implementing the external process > > >> cleanup coding. If I get this approach working robustly, a clean > > >> shutdown of the external process will be less important. > > >> > > >> > > >> As for the IPMI SOL issues: > > >> > > >> To start a session, I can use the following FreeIPMI command: > > >> ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial- > > >> keepalive > > >> > > >> I can quit out of that session using the &. escape sequence, and > > >> reconnect right away. But if I 'kill -9' that process, I get a > > >> "[error received]: BMC Error" message when I try to connect with > > >> another ipmiconsole command. This is the same error message I get > > >> when trying the connect when another session is already active. If I > > >> then issue the command: > > >> ./ipmiconsole -h n003-bmc -u root -p calvin -W solpayloadsize --serial- > > >> keepalive --deactivate > > >> This completes without error, but I still can't reconnect to the > > >> serial console. > > >> > > >> I get similar results when using ipmitool. In that case, when I try > > >> to reconnect, I get: > > >> #ipmitool -U root -P calvin -H n003-bmc -I lanplus sol activate > > >> Info: SOL payload already active on another session > > >> > > >> If I try to deactivate the existing session, I get: > > >> # ipmitool -U root -P calvin -H n003-bmc -I lanplus sol deactivate > > >> Info: SOL payload already de-activated > > >> > > >> Once it's in this state, the only thing I've been able to do to regain > > >> access to the serial console is reboot the BMC or wait for the session > > >> to time out. > > >> > > >> I have the same experience when connecting to Dell iDRAC5 and iDRAC6, > > >> both running the latest firmware. Al, if you'd like more information > > >> or debug output from the freeipmi tools I'd be happy to provide it. > > >> > > >> thanks, > > >> Brian > > >> > > >> On Jan 7, 6:06 pm, Al Chu <[email protected]> wrote: > > >>>> Thanks also for the FreeIPMI link. That list confirms the the issue > > >>>> I've been seeing with the Dell iDRACs not responding to the sol > > >>>> deactivate. I've made Dell aware of the issue, but don't know if they > > >>>> have any plans to fix it. > > >>> > > >>> When you do a "sol deactivate" does the original ipmitool session just > > >>> hang forever? I imagine you're hitting a scenario where the original > > >>> IPMI/SOL session cannot do SOL anymore, but can send/recv IPMI packets. > > >>> The IPMI session can send IPMI keepalive packets and stay happy all day > > >>> long, but no SOL traffic will ever be received. The only way to get a > > >>> timeout is to send SOL data (i.e. type at prompt), so that the SOL data > > >>> transfer eventually times out. > > >>> > > >>> I added a "serial keepalive" into ipmiconsole/libipmiconsole to try and > > >>> deal w/ this situation. As the name suggests, you "keepalive" a session > > >>> using SOL data instead of IPMI data so that the original sessions will > > >>> eventually time out (and exit, which is the end goal). In FreeIPMI's > > >>> ipmiconsole this is enabled w/ the "--serial-keepalive" option. > > >>> > > >>> I do believe ipmitool has a similar option "usesolkeepalive" (or > > >>> something to that affect). It may be worth trying too. > > >>> > > >>> Al > > >>> > > >>> > > >>> > > >>> On Fri, 2012-01-06 at 20:43 -0800, lambert wrote: > > >>>> I stand corrected, my second example does appear to work in regards to > > >>>> trapping the signal while in interact mode. Not sure what I was doing > > >>>> wrong the other day. > > >>> > > >>>> So I fleshed-out the code in the trap to have it log out of the cmc > > >>>> and exit out of the expect script upon receiving a SIGHUP, and that > > >>>> appears to work well. It can't trap a SIGKILL so it will take a > > >>>> modification to conman, as you suggested, to have an option for > > >>>> sending different signal types. Another approach would be to send a > > >>>> SIGHUP to all external processes by default, followed by a short wait, > > >>>> and then a SIGKILL to clean up any stragglers. I can try playing with > > >>>> that some, if you want to point me toward the relevant routine. > > >>> > > >>>> Thanks also for the FreeIPMI link. That list confirms the the issue > > >>>> I've been seeing with the Dell iDRACs not responding to the sol > > >>>> deactivate. I've made Dell aware of the issue, but don't know if they > > >>>> have any plans to fix it. > > >>> > > >>>> Thanks. > > >>> > > >>>> On Jan 6, 3:13 am, Chris Dunlap <[email protected]> wrote: > > >>>>> As for IPMI SOL connections, ConMan uses FreeIPMI. I know Al Chu > > >>>>> (FreeIPMI maintainer) has encountered bugs in several vendor > > >>>>> implementations, and has implemented various workarounds when > > >>>>> possible: > > >>> > > >>>>> http://www.gnu.org/software/freeipmi/freeipmi-bugs-issues-and-workaro... > > >>> > > >>>>> You could try the internal IPMI support to see if FreeIPMI is better > > >>>>> able to cope with the Dell blades. > > >>> > > >>>>> conmand connects to an external process via a fork/exec, duping the > > >>>>> ends of the child's socketpair onto stdin/stdout. It disconnects > > >>>>> from the process by closing its side of the socketpair and sending > > >>>>> a sigkill to the associated pid. > > >>> > > >>>>> The signal handler approach seems cleaner, but only if we're able > > >>>>> to handle signals within the interact block. Just playing around at > > >>>>> the shell, this seems to work: > > >>> > > >>>>> #!/usr/bin/expect -- > > >>>>> spawn $env(SHELL) > > >>>>> trap {send_user " SIG[trap -name] "} {USR1 USR2} > > >>>>> interact > > >>> > > >>>>> I'm not sure why your 2nd example doesn't work. I'll try to look at > > >>>>> this some more in the next few days. > > >>> > > >>>>> -Chris > > >>> > > >>>>> On Thu, 2012-01-05 at 07:56am PST, lambert wrote: > > >>> > > >>>>>> What I'm trying to do in this case is issue the following commands to > > >>>>>> connect to a virtual serial console, on a Dell blade, through the > > >>>>>> chassis management controller. > > >>> > > >>>>>> ssh <cmc host> > > >>>>>> connect -m server-<n> > > >>> > > >>>>>> At this point I would issue an interact command in the expect script. > > >>> > > >>>>>> Then, to close the connection requires sending a ^\ to close the > > >>>>>> serial connection, followed by an 'exit' to exit out of the cmc ssh > > >>>>>> connection. > > >>> > > >>>>>> Note that the Dell blades do support IPMI SOL. I'm currently using > > >>>>>> an > > >>>>>> external script to drive ipmitool (hadn't realized conman now > > >>>>>> supports > > >>>>>> ipmi sol connections internally). It's working for the most part, > > >>>>>> but > > >>>>>> I'm hitting the same problem in that 1) I can't issue an 'sol > > >>>>>> deactivate' to close the connection when conmand shuts down and 2) > > >>>>>> The > > >>>>>> Dell BMCs don't appear to honor the 'sol deactivate' command anyway. > > >>> > > >>>>>> I'm having some general reliability issues with using IPMI SOL on the > > >>>>>> Dell blades, so thought I'd try going through the above approach of > > >>>>>> establishing a connection by way of the cmc. > > >>> > > >>>>>> I was thinking along the lines of a signal handler. How does conman > > >>>>>> currently execute the external process, is it just a 'system' call? > > >>>>>> Just wondering if the external process is already receiving a SIGKILL > > >>>>>> when conmand shuts down. > > >>> > > >>>>>> Just now I experimented with creating a 'trap' inside my expect > > >>>>>> script. It works, up until the interact block. Once the interact > > >>>>>> command is executed, the signal handler is no longer being run: > > >>> > > >>>>>> This works ( I see 'Ouch!' printed with each SIGUSR1 signal): > > >>> > > >>>>>> set timeout -1 > > >>>>>> spawn /bin/sh > > >>>>>> match_max 100000 > > >>>>>> send -- "ssh cmc1\r" > > >>>>>> expect -exact "ssh cmc1\r > > >>>>>> root@cmc1's password: " > > >>>>>> send -- "#####\r" > > >>>>>> expect -gl "\$ " > > >>>>>> trap {send_user "Ouch!"} SIGUSR1 > > >>> > > >>>>>> But once I add the 'interact' command, the signal handler stops > > >>>>>> working, and a SIGUSR1 just causes the expect script to exit: > > >>>>>> set timeout -1 > > >>>>>> spawn /bin/sh > > >>>>>> match_max 100000 > > >>>>>> send -- "ssh cmc1\r" > > >>>>>> expect -exact "ssh cmc1\r > > >>>>>> root@cmc1's password: " > > >>>>>> send -- "#####\r" > > >>>>>> expect -gl "\$ " > > >>>>>> trap {send_user "Ouch!"} SIGUSR1 > > >>>>>> interact > > >>> > > >>>>>> Thanks. > > >>> > > >>>>>> On Jan 5, 3:01=A0am, Chris Dunlap <[email protected]> wrote: > > >>>>>>> No, ConMan currently has no mechanism to trigger an external process > > >>>>>>> for cleanup before exiting. > > >>> > > >>>>>>> One possibility would be to have config keywords to specify, say, > > >>>>>>> an ExecExitStr and ExecExitDelay. =A0On exit, conmand would write > > >>>>>>> the ExecExitStr string into the associated console byte stream, > > >>>>>>> after which it would wait ExecExitDelay seconds before terminating. > > >>>>>>> The expect script could specify this ExecExitStr pattern in its > > >>>>>>> interact block, and upon matching it, perform the necessary sends & > > >>>>>>> expects to prepare the remote console. =A0The ExecExitDelay would > > >>>>>>> give > > >>>>>>> it time to run. =A0One downside to this approach is that there is no > > >>>>>>> way to prevent a connected user from typing the ExecExitStr pattern, > > >>>>>>> thereby triggering the interact block in the expect script. > > >>> > > >>>>>>> Another possibility would be to specify a signal handler within > > >>>>>>> the expect script, and conmand could signal the associated pid > > >>>>>>> with an ExecExitSigNum signal before waiting ExecExitDelay seconds > > >>>>>>> to terminate. =A0But I'd have to do some experimentation to see if I > > >>>>>>> could craft an appropriate signal handler for an expect script. > > >>> > > >>>>>>> Can you elaborate on what you would like to do in order to cleanly > > >>>>>>> close such a connection? > > >>> > > >>>>>>> -Chris > > >>> > > >>>>>>> On Wed, 2012-01-04 at 02:41pm PST, lambert wrote: > > >>> > > >>>>>>>> Is there a way to trigger a clean exit of an external console > > >>>>>>>> process, > > >>>>>>>> when the conman daemon is shut down? =A0Say I'm using the ssh.exp > > >>>>>>>> script, when the conman daemon is shut down (/etc/init.d/conman > > >>>>>>>> stop), > > >>>>>>>> I'd like to have the ssh.exp script issue commands to cleanly close > > >>>>>>>> the connection. > > >>> > > >>>>>>>> I'm trying to work around a problem with some Dell blades where if > > >>>>>>>> the > > >>>>>>>> virtual serial console connection is not terminated cleanly, I > > >>>>>>>> have to > > >>>>>>>> wait several minutes or reboot the BMC in order to regain access. > > >>> > > >>>>>>>> thanks. > > >>> > > >>> -- > > >>> Albert Chu > > >>> [email protected] > > >>> Computer Scientist > > >>> High Performance Systems Division > > >>> Lawrence Livermore National Laboratory > > > -- > > > Albert Chu > > > [email protected] > > > Computer Scientist > > > High Performance Systems Division > > > Lawrence Livermore National Laboratory > > > > -- > Albert Chu > [email protected] > Computer Scientist > High Performance Systems Division > Lawrence Livermore National Laboratory > > > > _______________________________________________ > Freeipmi-devel mailing list > [email protected] > https://lists.gnu.org/mailman/listinfo/freeipmi-devel -- Albert Chu [email protected] Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-devel mailing list [email protected] https://lists.gnu.org/mailman/listinfo/freeipmi-devel
