Hi,

> On 1. Oct 2023, at 22:59, Corey Minyard <[email protected]> wrote:
> 
> On Oct 1, 2023 11:14 AM, Christian Theune via Openipmi-developer 
> <[email protected]> wrote:
> Hi,
> > On 1. Oct 2023, at 03:49, Corey Minyard <[email protected]> wrote:
> > 
> > On Sat, Sep 30, 2023 at 11:14:01PM +0200, Christian Theune via 
> > Openipmi-developer wrote:
> >> Hi,
> >> 
> >> sorry if this isn’t directly a developers question, but I’ve run out of 
> >> avenues after googling and looking around… 
> >> 
> >> We’re experiencing weird system stability issue where the “log to SEL” 
> >> doesn’t cut it: we see watchdog reboots but no kernel output whatsoever 
> >> ending up in the SEL. (I’ve debugged this with Corey before and we found 
> >> something to fix but the watchdog events we’re experiencing still don’t 
> >> get logged in more detail.)
> > 
> > Can you not get kernel coredumps?
> Unfortunately no and I still have absolutely now idea why the watchdog 
> triggers… I have currently attached dozens of servers that are part of a 
> mysterious series of crashes but they didn’t crash after I attached the SOL 
> continuously. Just my kind of luck I guess … ;)
> 
> It might be a clue.  Can you make sure flow-control is turned off on the SOL 
> connection and console?  If you have "r" on the console= command (like 
> console=115200n81r) , if the BMC stops taking characters you can hang the 
> kernel.
> 
> You might want to make sure getty has RTS turned off, too.
> 
> The trouble is, of course, that you can lose characters because of a slow 
> BMC.  But it's generally a bad idea to run a console with flow control 
> enabled.

Sorry, that might have been a misunderstanding: I’m not catching the crashes 
currently because all the machines that used to crash now seem to not want to 
crash anymore. I guess we’re on a Heisenbug here. Getting output from the SOL 
works absolutely fine, so I expect to see a kernel crash in the SOL once it 
happens.

I’m somewhat suspecting that we’ll find another bug that causes those specific 
crashes not appear in the SEL, though … 

And then again: maybe it’s not a Heisenbug, but maybe whatever caused the 
crashes has been fixed in between and I’ll never know … ;)

> As we’re continuously updating our environment it might also be that we’ve 
> successfully evaded a kernel bug that was haunting us … maybe … ;)
> >> 
> >> I’m wondering: does anyone know of a “push” solution to instruct the BMC 
> >> (mostly Supermicro in our case) to push SOL output proactively through 
> >> some protocol like syslog? 
> > 
> > The SEL probably isn't big or fast enough to take system logs.  You
> > could create something like this as part of printk, but I suspect that
> > it would quickly overflow the SEL.
> Yeah, I wasn’t thinking about the SEL but wondering whether serial output 
> could be shipped in a push-manner from the BMC without having to attach and 
> authenticate.
> 
> That would take some work in the BMC.

That’s what I thought. Not a promising avenue I guess … I wouldn’t even know 
who to talk to with any chance of success … ;)

> >> Otherwise we’d need to set up a central host with passwords for dozens of 
> >> hosts to pull the SOL for logging and that doesn’t feel right either … -__
> > 
> > I know people that do this; it's not terrible.  You do have all of your
> > IPMI passwords in one place, that's the biggest issue, but IMHO you
> > should be monitoring the output of your consoles, anyway.
> Yeah, that’s what I’m pondering, too. IMHO it’s quite a bit terrible and thus 
> I was wondering whether the BMC might have a built-in solution that would 
> turn this upside down … but I gess not
> > I support a program called ser2net that is capable of making SOL
> > connections, logging the output, and allowing connections to the
> > console.  That would be a pretty complicated setup, but I can help you
> > with it, if you like.
> The multiplexing sounds great. I’ve built a small shell wrapper to manage SOL 
> connections and their logging (and reconnecting if the BMC acts up) which 
> works for now.
> From a design perspective I’d really love this to be push-based. I researched 
> the dmtf site, but didn’t find anything there either … I guess I’m the 
> odd-one out then …
> No idea.  So with your little wrapper connected everything seems to work ok.
> 
> Outside of the flow control thing, I have no idea.

Thanks for the input, though! I wasn’t sure I was missing something obvious. 
I’ll let you know if I should ever find out what’s going on here … 

Christian

-- 
Christian Theune · [email protected] · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick



_______________________________________________
Openipmi-developer mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openipmi-developer

Reply via email to