On Oct 1, 2023 11:14 AM, Christian Theune via Openipmi-developer <[email protected]> wrote:

Hi,

> On 1. Oct 2023, at 03:49, Corey Minyard <[email protected]> wrote:
>
> On Sat, Sep 30, 2023 at 11:14:01PM +0200, Christian Theune via Openipmi-developer wrote:
>> Hi,
>>
>> sorry if this isn’t directly a developers question, but I’ve run out of avenues after googling and looking around…
>>
>> We’re experiencing weird system stability issue where the “log to SEL” doesn’t cut it: we see watchdog reboots but no kernel output whatsoever ending up in the SEL. (I’ve debugged this with Corey before and we found something to fix but the watchdog events we’re experiencing still don’t get logged in more detail.)
>
> Can you not get kernel coredumps?

Unfortunately no and I still have absolutely now idea why the watchdog triggers…

I have currently attached dozens of servers that are part of a mysterious series of crashes but they didn’t crash after I attached the SOL continuously. Just my kind of luck I guess … ;)


It might be a clue.  Can you make sure flow-control is turned off on the SOL connection and console?  If you have "r" on the console= command (like console=115200n81r) , if the BMC stops taking characters you can hang the kernel.

You might want to make sure getty has RTS turned off, too.

The trouble is, of course, that you can lose characters because of a slow BMC.  But it's generally a bad idea to run a console with flow control enabled.

As we’re continuously updating our environment it might also be that we’ve successfully evaded a kernel bug that was haunting us … maybe … ;)

>>
>> I’m wondering: does anyone know of a “push” solution to instruct the BMC (mostly Supermicro in our case) to push SOL output proactively through some protocol like syslog?
>
> The SEL probably isn't big or fast enough to take system logs.  You
> could create something like this as part of printk, but I suspect that
> it would quickly overflow the SEL.

Yeah, I wasn’t thinking about the SEL but wondering whether serial output could be shipped in a push-manner from the BMC without having to attach and authenticate.


That would take some work in the BMC.

>> Otherwise we’d need to set up a central host with passwords for dozens of hosts to pull the SOL for logging and that doesn’t feel right either … -__
>
> I know people that do this; it's not terrible.  You do have all of your
> IPMI passwords in one place, that's the biggest issue, but IMHO you
> should be monitoring the output of your consoles, anyway.

Yeah, that’s what I’m pondering, too. IMHO it’s quite a bit terrible and thus I was wondering whether the BMC might have a built-in solution that would turn this upside down … but I gess not

> I support a program called ser2net that is capable of making SOL
> connections, logging the output, and allowing connections to the
> console.  That would be a pretty complicated setup, but I can help you
> with it, if you like.

The multiplexing sounds great. I’ve built a small shell wrapper to manage SOL connections and their logging (and reconnecting if the BMC acts up) which works for now.

From a design perspective I’d really love this to be push-based. I researched the dmtf site, but didn’t find anything there either … I guess I’m the odd-one out then …

No idea.  So with your little wrapper connected everything seems to work ok.

Outside of the flow control thing, I have no idea.

-corey

Christian

--
Christian Theune · [email protected] · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer


_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to