As promised the patch.

keepalive checks type of rsp, while the rsp is sol, it acks it, checks
for new data, passes to input_handler, and grabs whatever is next on the
receive path.  If it doesn't get an answer in this loop, it decides the
SOL data transaction was a good enough keepalive.

There is a slim possiblity as written of the Get Device ID transaction
not being complete still.  If the get device id command request or
response got lost on the network at the exact same time SOL data did get
in, get device id command will not be retried.  Instead, the keepalive
function assumes if it detects this unlikely scenario, that the
connection must be alive and the BMC timeout reset since SOL data is
being received and acked.  This shouldn't leak data into the SOL output
handler, but still the SOL output handler remains robust just in case. 

If you wanted to be paranoid about the get device id that really isn't
important anyway, you could change the new return 0 to rsp =
intf->sendrecv(intf, &req) to start another retry of the heartbeat
command, but I think that's not optimal/needed here.

I tried the patch on my system and futzed with the keepalive and
verified it did prevent the getdevice id from being in the wrong place
and the BMC didn't have to retry the badly timed packet.

> For some reason I didn't get the mail, but I'll reply here.
> 
> 
> 
> > OK, I can revert the solution but I don't understand why this solution
> > does not work.
> > However, sending an empty packet is a valid solution.  There is no
> mention in the IPMI spec
> > that the packet sent must be a get device id or anything else.
> 
> The spec doesn't speak to any keepalive strategy at all, so whatever
> strategy successfully can do a full transaction is all the guidance you
> can get from the spec.
> 
> The problem with empty SOL packets is the spec doesn't speak to those at
> all (they have no apparent relevant use as far as it is concerned),
> and so how they are dealt with is up for interpretation.  The spec says
> SOL payload must be acknowledged in terms of number of bytes, but if a
> packet contains no data, it seems two reactions are spec compliant,
> acking 0 and not acking at all.  The BMCs I work with happened to choose
> the latter, only acking if they have non-zero number of bytes to ack. A
> lot of common network practices aren't directly addressed in the spec
> (i.e. non-SOL traffic has no reliable way for a party to distinguish
> retry from new command, and keepalive is left up to the implementor).  I
> agree common sense would have the ack as a two-part deal, an
> acknowledgement of the packet and of the data contained in the packet,
> but the spec only speaks specifically to acking the data of the packet.
> 
> Besides, there may be another bug to look at, but the pcap files I
> looked at demonstrate that despite the BMC never sending an ACK or any
> traffic back, exhausting ipmitool retries, ipmitool never decides the
> connection is dead.
> 
> 
> > Do you have a plan to modify the code to correctly support the SOL
> traffic when the keep
> > alive data is sent?
> 
> Yes, I submitted a patch a while back to the list, but it was missing a
> couple of steps to make it complete.  I'll put together a single
> complete patch against lanplus.c and send it in a few after I do basic
> testing on it, so I'd wait for that, but the answer is a straightforward
> one that is confined to the lanplus keepalive function.
> 
> > Thanks!
> > Jean-Michel Audet
> 
> On Wed, 2006-11-22 at 10:51 -0500, Jarrod B Johnson wrote:
> > On Mon, 2006-11-13 at 13:55 -0800, Al Chu wrote:
> > > Hi Jean,
> > >
> > > > Here is what I propose.   Instead of using a getDeviceId for the
> > > > keepalive mechanism, we could use a simple empty SOL transmit.  (Valid
> > > > sequence number, no acknowledge, length of 0).  This will trig no work
> > > > on the BMC side and will no generate also an answer.
> > >
> > > This solution would bother me a bit.  I don't think the IPMI spec is
> > > specific enough on the behavior that should occur if an empty SOL packet
> > > is sent.  Some other implementations may return an error instead
> >
> > I note that CVS accepted that patch, and can confirm that the strategy
> > causes problems on at least one BMC implementation, and in the code
> > itself.
> >
> > The problem is that the BMC never acknowledges the keepalive packet, and
> > so every 30 seconds you have a huge stall because ipmitool exhausts
> > retries, but doesn't close connection.
> >
> > BTW, this also means that ipmitool didn't close the connection when it
> > never received a keepalive response, so on this count also there seems
> > to be a problem that it fails to detect what could be a dead connection.
> >
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share your
> > opinions on IT & business topics through brief surveys - and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > _______________________________________________
> > Ipmitool-devel mailing list
> > Ipmitool-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ipmitool-devel
> 
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Ipmitool-devel mailing list
> Ipmitool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ipmitool-devel
diff -urN ipmitool-11271146/src/plugins/lanplus/lanplus.c ipmitool/src/plugins/lanplus/lanplus.c
--- ipmitool-11271146/src/plugins/lanplus/lanplus.c	2006-09-12 19:23:28.000000000 -0400
+++ ipmitool/src/plugins/lanplus/lanplus.c	2006-11-27 12:03:57.000000000 -0500
@@ -3459,6 +3459,17 @@
 		return 0;
 
 	rsp = intf->sendrecv(intf, &req);
+	while (rsp != NULL && is_sol_packet(rsp)) {
+                /* rsp was SOL data instead of our answer */
+                /* since it didn't go through the sol recv, do sol recv stuff here */
+                ack_sol_packet(intf, rsp);
+                check_sol_packet_for_new_data(intf, rsp);
+                if (rsp->data_len)
+                        intf->session->sol_data.sol_input_handler(rsp);
+		rsp = ipmi_lan_poll_recv(intf);
+		if (rsp == NULL) /* the get device id answer never got back, but retry mechanism was bypassed by SOL data */
+			return 0; /* so get device id command never returned, the connection is still alive */
+        }
 
 	if (rsp == NULL)
 		return -1;
Binary files ipmitool-11271146/src/plugins/lanplus/.lanplus.c.swp and ipmitool/src/plugins/lanplus/.lanplus.c.swp differ
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Ipmitool-devel mailing list
Ipmitool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ipmitool-devel

Reply via email to