Supermicro (after pointing me to web interface and SNMP...): "Sorry, we do not have this Information at our support desk. you can request this via your sales channel, but it can be that you would need to sign an NDA for such information."
So we're on our own, I don't have any better contact as we buy from a reseller. Besides they'd want an NDA for that 3 lines of code. Best, Tom Hetmer CDN77 Operations supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com ----- Původní zpráva ----- Odesilatel: "Tom Hetmer" <tomas.het...@cdn77.com> Příjemce: "Al Chu" <ch...@llnl.gov>, freeipmi-users@gnu.org Datum: 12/11/18 12:09 Předmět: Re[3]: [Freeipmi-users] Decoding ram errors on supermicro Hey, so that was fast - we've got an older X10SLM-F rented by a customer. IPMI web says 201 2018/09/22 00:23:34 OEM Memory Correctable Memory ECC @ DIMMB2(CPU1) 202 2018/09/29 09:31:25 OEM Memory Correctable Memory ECC @ DIMMB2(CPU1) 203 2018/10/13 19:31:34 OEM Memory Correctable Memory ECC @ DIMMB2(CPU1) 204 2018/10/20 01:49:38 OEM Memory Correctable Memory ECC @ DIMMB2(CPU1) freeipmi: ID | Date | Time | Name | Type | State | Event 7 | Jan-21-2016 | 15:26:16 | FANA | Fan | Critical | Lower Critical - going low ; Sensor Reading = 0.00 RPM ; Threshold = 600.00 RPM 8 | Jan-21-2016 | 15:26:16 | FANA | Fan | Critical | Lower Non-recoverable - going low ; Sensor Reading = 0.00 RPM ; Threshold = 400.00 RPM 9 | Jan-21-2016 | 15:26:25 | FANA | Fan | Critical | Lower Non-recoverable - going low ; Sensor Reading = 13300.00 RPM ; Threshold = 400.00 RPM 10 | Jan-21-2016 | 15:26:25 | FANA | Fan | Warning | Lower Critical - going low ; Sensor Reading = 13300.00 RPM ; Threshold = 600.00 RPM 201 | Sep-22-2018 | 00:23:34 | Sensor #0 | Memory | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h 202 | Sep-29-2018 | 09:31:25 | Sensor #0 | Memory | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h 203 | Oct-13-2018 | 19:31:34 | Sensor #0 | Memory | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h 204 | Oct-20-2018 | 01:49:38 | Sensor #0 | Memory | Warning | Correctable memory error ; OEM Event Data2 code = 2Bh ; OEM Event Data3 code = 80h We'll ask the customer for downtime to replace it, all should then be correct as it's official data from supermicro's own interface. Best, Tom Hetmer CDN77 Operations supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com ----- Původní zpráva ----- Odesilatel: "Tom Hetmer" <tomas.het...@cdn77.com> Příjemce: freeipmi-users@gnu.org, "Al Chu" <ch...@llnl.gov> Datum: 12/11/18 11:59 Předmět: Re[2]: [Freeipmi-users] Decoding ram errors on supermicro Hi, it appears we have no ECC errors on the servers we directly own right now. I can let you know when we get one though. We rent out some machines to customers as well, maybe there's some errors there => my colleague will check the report today. I also created a ticket with Supermicro just if they can confirm we're looking at the right code/add any official details. Best, Tom Hetmer CDN77 Operations supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com ----- Původní zpráva ----- > Odesilatel: "Al Chu" <ch...@llnl.gov> > Příjemce: "Tom Hetmer" <tomas.het...@cdn77.com>, freeipmi-users@gnu.org > Datum: 12/11/18 02:28 > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro > > Hey Tom, > > Is there a specific motherboard (amongst the product IDs you mentioned > below) you have with a dimm error that we can test on. To make sure I > don't make a major mistake, I'd like to code to 1 motherboard first. > > Thanks, > Al > > > On Wed, 2018-12-05 at 10:48 -0800, Albert Chu wrote: > > On Wed, 2018-12-05 at 03:38 +0100, Tom Hetmer wrote: > > > Alright, added to github. > > > > > > Here's the output from bmc-info for that particular board. > > > Product ID : 2201 > > > [Mon Dec 3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4, > > > BIOS 2.0 01/30/2016 > > > > > > > > > I guess you'll support it based on the product ID? > > > > Yes! Thanks. I'll put these in the ticket too. > > > > Al > > > > > So if there are any other (X10) boards with different product ID > > > but > > > the same SEL output I'll have to send it again, correct? > > > > > > > > > I have all kinds of numbers on other machines, > > > ie. > > > X10DRW-E => 2148 > > > X11SPi-TF => 2369 > > > X10SLL-F => 2049 > > > X10DRL-i => 2097 > > > X11DDW-NT => 2407 > > > X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051 > > > > > > > > > and so on.. I think we have at least 1/4 of the boards they > > > manufacture. > > > X9s are under 2000, X11 seems to be 23xx. But that's maybe too much > > > reverse engineering to you ;) > > > I can try to ping them and ask about details but I got no offical > > > contact with Supermicro. > > > > > > > > > Best, > > > Tom Hetmer > > > > > > > > > CDN77 Operations > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > > > ----- Původní zpráva ----- > > > > Odesilatel: "Albert Chu" <ch...@llnl.gov> > > > > Příjemce: "Tom Hetmer" <tomas.het...@cdn77.com>, freeipmi-users@g > > > > nu > > > > .org > > > > Datum: 12/04/18 19:40 > > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro > > > > > > > > On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote: > > > > > Sure. It seems there's a similar ticket > > > > > already: https://github.com/chu11/freeipmi-mirror/issues/19 > > > > > > > > Ahh, if you could, update it with info from ipmitool / ipmiutil. > > > > I > > > > was > > > > reluctant to add support based on reverse engineering. But if > > > > other > > > > tools have "official" interpretations from Supermicro, I'm more > > > > confident in the addition. > > > > > > > > > Yep, that's the code. ipmitool and a few others decode it too. > > > > > > > > > > > > > > > We have a *lot* of Supermicros so I can help with testing if > > > > > needed - > > > > > but we don't get that much CRC errors though :) > > > > > > > > The one thing I'll need is product ID numbers (you can get from > > > > bmc- > > > > info) and the name of the product. This goes into the > > > > documentation > > > > and some of the code. > > > > > > > > Thanks, > > > > > > > > Al > > > > > > > > > So I guess we'd have to wait till one pops up. But I hope the > > > > > 'ver 2' > > > > > method from ipmiutil works fine. > > > > > We used ipmitool in our monitoring before and it was accurate > > > > > but > > > > > slow, that's why I rewrote it all to use freeipmi. > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > Best, > > > > > Tom Hetmer > > > > > > > > > > > > > > > CDN77 Operations > > > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > > > > > > > ----- Původní zpráva ----- > > > > > > Odesilatel: "Albert Chu" <ch...@llnl.gov> > > > > > > Příjemce: "Tom Hetmer" <tomas.het...@cdn77.com>, freeipmi- > > > > > > users > > > > > > @gnu > > > > > > .org > > > > > > Datum: 12/03/18 21:06 > > > > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on > > > > > > supermicro > > > > > > > > > > > > Hi Tom, > > > > > > > > > > > > Thanks for the pointer to ipmiutil's code. I assume you > > > > > > found > > > > > > this > > > > > > comment: > > > > > > > > > > > > --- > > > > > > /* ver 2 method: 2A 80 = P1_DIMMB1 > > > > > > */ > > > > > > > > > > > > > > > > > > > > > > > > /* SuperMicro > > > > > > says: > > > > > > > > > > > > > > > > > > > > > > > > * pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * > > > > > > 3, > > > > > > (='B') > > > > > > > > > > > > > > > > > > > > > > > > * dimm: %c (data2 & 0xf) + > > > > > > 0x27, > > > > > > > > > > > > > > > > > > > > > > > > * cpu: %x (data3 & 0x03) + > > > > > > 1); > > > > > > > > > > > > > > > > > > > > > > > > */ > > > > > > --- > > > > > > > > > > > > I can definitely add it to my todo list. > > > > > > > > > > > > Would you mind writing up an issue on github here? > > > > > > > > > > > > https://github.com/chu11/freeipmi-mirror > > > > > > > > > > > > Al > > > > > > > > > > > > On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote: > > > > > > > Hi, > > > > > > > > > > > > > > it'd be good if freeipmi supported decoding the supermicro > > > > > > > ECC > > > > > > > errors. > > > > > > > > > > > > > > > > > > > > > Manufacturer: Supermicro > > > > > > > Product Name: X10DRH LN4 > > > > > > > eg. > > > > > > > freeipmi > > > > > > > 1,Dec-01-2018,06:37:53,Sensor > > > > > > > #0,Memory,Critical,Uncorrectable > > > > > > > memory > > > > > > > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code = > > > > > > > 81h > > > > > > > > > > > > > > > > > > > > > web interface > > > > > > > 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC > > > > > > > (@DIMMG1(CPU2)) | Asserted > > > > > > > > > > > > > > > > > > > > > something like this worked for me (stolen from ipmiutil) > > > > > > > > > > > > > > > > > > > > > $cpu = ($data3 & 0x03) + 1; > > > > > > > > > > > > > > > > > > > > > $NPAIRS = 26; > > > > > > > $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; > > > > > > > > > > > > > > > > > > > > > $bdata = "0x".$data2.$data3; > > > > > > > $bdata = hexdec($bdata); > > > > > > > $pair = (($bdata & 0xF0) >> 4) - 1; > > > > > > > > > > > > > > > > > > > > > if ($pair < 0) $pair = 0; > > > > > > > if ($pair > $NPAIRS) $pair = $NPAIRS - 1; > > > > > > > > > > > > > > > > > > > > > $pair = $rgpairs[$pair - 1]; > > > > > > > > > > > > > > > > > > > > > $dimm = $bdata & 0x0F; > > > > > > > > > > > > > > > > > > > > > $dimm may be incorrect as the original code decrements 9, > > > > > > > but > > > > > > > on > > > > > > > that > > > > > > > board it was wrong so i changed it to get the right result > > > > > > > - > > > > > > > we'll > > > > > > > see if it keeps getting the right values. > > > > > > > > > > > > > > Best, > > > > > > > Tom Hetmer > > > > > > > > > > > > > > > > > > > > > CDN77 Operations > > > > > > > supp...@cdn77.com / +44 (0) 20 3514 2399 / www.cdn77.com > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Freeipmi-users mailing list > > > > > > > Freeipmi-users@gnu.org > > > > > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users > > > > > > > > > > > > -- > > > > > > Albert Chu > > > > > > ch...@llnl.gov > > > > > > Computer Scientist > > > > > > High Performance Systems Division > > > > > > Lawrence Livermore National Laboratory > > > > > > > > > > _______________________________________________ > > > > > Freeipmi-users mailing list > > > > > Freeipmi-users@gnu.org > > > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users > > > > > > > > -- > > > > Albert Chu > > > > ch...@llnl.gov > > > > Computer Scientist > > > > High Performance Systems Division > > > > Lawrence Livermore National Laboratory > > > > > > _______________________________________________ > > > Freeipmi-users mailing list > > > Freeipmi-users@gnu.org > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users _______________________________________________ Freeipmi-users mailing list Freeipmi-users@gnu.org https://lists.gnu.org/mailman/listinfo/freeipmi-users