On Mon, 2006-11-13 at 13:55 -0800, Al Chu wrote:
> Hi Jean,
> 
> > Here is what I propose.   Instead of using a getDeviceId for the
> > keepalive mechanism, we could use a simple empty SOL transmit.  (Valid
> > sequence number, no acknowledge, length of 0).  This will trig no work
> > on the BMC side and will no generate also an answer.  
> 
> This solution would bother me a bit.  I don't think the IPMI spec is
> specific enough on the behavior that should occur if an empty SOL packet
> is sent.  Some other implementations may return an error instead.

I concur, it's a risky assumption.  It just so happens his strategy uses
a path of code that's better prepared for unexpected data flows, the
correct answer is to fix the path the keepalive function uses, since
every other IPMI implementation I know of uses non SOL messages for
keepalives, it's a more commonly accepted behavior.

> 
> > It looks like IPMITOOL is not done in a way that question and answer 
> > can be dissociated.  There is no guarantee that packet of a different 
> > payload type could be received in between of question/answer of 
> > another type… or of the same type.
> 
> I certainly don't know the code as well as others, but it seems in
> ipmi_lan_poll_recv() in lanplus.c, it does look as though packets are
> parsed so we know if the payload type is type IPMI or SOL.  So I think a
> method to differentiate between different payload types received is
> done?

ipmi_lan_poll_recv knows, but that isn't sufficient to protect against
the problem.  I've attached what I think could be a decent fix against
todays CVS, and would prevent one extra retry on the part of BMCs that
is happening today, and incorrect parsing of an SOL packet.  Detailed
explanation follows.  I haven't tested it beyond making sure it
compiles, so someone please provide feedback.

I put a bandaid in ipmi_sol.c in a patch that was accepted back in July
in output because I saw GetDeviceID content spitting out as SOL data,
Now that I look harder, I see it must be in ipmi_red_pill's call to
intf->recv_sol that gets the incorrect rsp packet (all other calls are
well guarded I can see), ipmi_lanplus_recv_sol assumes whatever rsp gets
returned by ipmi_lan_poll_recv() is SOL data, which might not be the
case.  From the clients side, it's fairly deterministic, but it isn't
designed to be robust to the state of the BMC it is connected to.

The problem happens when the BMC is sending output (at exactly the wrong
time).  The keepalive function sends 'getDeviceID', and before it gets
an answer, ipmitool receives an SOL payload packet from the BMC.
ipmi_lanplus_send_payload is state sensitive, but bases the state of
what it expects on the outgoing packet, so it had just sent a
getdevice_id, so it goes to the very simple:
               rsp = ipmi_lan_poll_recv(intf);
                        if (rsp)
                                break;
So the SOL payload from the bmc is passed into ipmi_lan_poll_recv, which
is stateful based on the contents of the received packet.  It jumps down
and happily populates the rsp for SOL processing.  (note that SOL
payload packets do not explicitly set ccode in the rsp struct, afaict,
so the rsp packet will have something to do with the current packet).
Anyway, so this struct populated as SOL payload gets returned up to
ipmi_lanplus_keepalive, and it checks that rsp is not NULL (it shouldn't
be), and if rsp->ccode is greater than zero, which is generally not the
case since ccode is probably set to whatever the last time a normal
packet was sent and if you are still up it means it was probably zero.
Now the client never acks the SOL data packet, so the SOL data should
get retried by the BMC side unless the BMC doesn't retry correctly, so
the data shouldn't be lost.    

Ok, so assume keepalive happens to read 0 at ccode, and jumps into
red_pill.  red_pill asks for some sol data, and gets the response to the
until-now ignored get device id.  Now you have get_device_id being
processed through recv_sol, ack_sol_packet fortunately ignores it,
check_sol_packet_for_new data similarly ends up returning zero, but in
the end ipmi_lanplus_recv_sol will return the GetDeviceId rsp packet,
which is passed to output, and in CVS this too is harmless.  In the end
here, at least in CVS, this is strange, but innocuous behavior overall.
diff -urN ipmitool/src/plugins/lanplus/lanplus.c ipmitool.fixed/src/plugins/lanplus/lanplus.c
--- ipmitool/src/plugins/lanplus/lanplus.c	2006-11-18 12:28:17.000000000 -0500
+++ ipmitool.fixed/src/plugins/lanplus/lanplus.c	2006-11-18 12:07:38.000000000 -0500
@@ -3460,6 +3460,14 @@
 
 	rsp = intf->sendrecv(intf, &req);
 
+	while (rsp != NULL && is_sol_packet(rsp)) {
+		/* rsp was SOL data from BMC instead of our answer, act like send_payload does when it happens to it */
+		if (rsp->data_len)
+			intf->session->sol_data.sol_input_handler(rsp);
+		/* Get another packet */	
+		rsp = intf->sendrecv(intf, &req);
+	}
+	/* rsp is not sol */
 	if (rsp == NULL)
 		return -1;
 	if (rsp->ccode > 0)
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to