Tue, 6 Oct 2015 21:41:15 -0700 Mike Larkin <mlar...@azathoth.net>
> I had thought this was acpi related earlier (before we realized that disabling
> lm* fixes it). So I have no news here, as I don't think the solution is going
> to be found in the AML.

Thanks for the update and pointer in the right direction (regarding
disabling lm(4) sensor). Indeed this does not happen with bsd.rd during
upgrades, and I recall back in the day this issue may not have been
originally present back in 2011.

> The lm(4) sensor is probably getting wedged somehow, which is causing the bios
> to think the machine is too hot on reboot. Even though it's not.

Makes sense as the readings are improbable after running for a while
and things looks stuck somehow, including in the BMC web interface.

Side note: I know, just don't sway to the insane design flaws regarding
security and interfaces, there are popcorn scary topics on the list,
now stay on topic pls.

Here is the reading from a long run where the sensors appear stuck both
on the shell and in the BMC:

$ sysctl hw.sensors    
hw.sensors.cpu0.temp0=33.00 degC
hw.sensors.lm1.temp0=-1.00 degC
hw.sensors.lm1.temp1=-0.50 degC
hw.sensors.lm1.temp2=-0.50 degC
hw.sensors.lm1.volt0=2.04 VDC (VCore)
hw.sensors.lm1.volt1=13.46 VDC (+12V)
hw.sensors.lm1.volt2=4.08 VDC (+3.3V)
hw.sensors.lm1.volt3=4.08 VDC (+3.3V)
hw.sensors.lm1.volt4=1.85 VDC (-12V)
hw.sensors.lm1.volt5=0.00 VDC
hw.sensors.lm1.volt6=0.00 VDC
hw.sensors.lm1.volt7=4.08 VDC (3.3VSB)
hw.sensors.lm1.volt8=2.04 VDC (VBAT)
$

And again after resetting the IPMI device these look only incorrect at
some readings, but not as stuck as above:

$ sysctl hw.sensors 
hw.sensors.cpu0.temp0=33.00 degC
hw.sensors.lm1.temp0=41.00 degC
hw.sensors.lm1.temp1=42.00 degC
hw.sensors.lm1.temp2=26.00 degC
hw.sensors.lm1.volt0=1.10 VDC (VCore)
hw.sensors.lm1.volt1=6.86 VDC (+12V)
hw.sensors.lm1.volt2=3.33 VDC (+3.3V)
hw.sensors.lm1.volt3=3.33 VDC (+3.3V)
hw.sensors.lm1.volt4=-10.34 VDC (-12V)
hw.sensors.lm1.volt5=1.28 VDC
hw.sensors.lm1.volt6=1.82 VDC
hw.sensors.lm1.volt7=3.28 VDC (3.3VSB)
hw.sensors.lm1.volt8=1.57 VDC (VBAT)


> I don't know a lot about the lm(4) driver so I don't think I'll be able to
> help much here. One of the things I do know about it is that sometimes you
> don't actually even have a real lm(4), and that it's simulated by some other
> component or even SMM. Maybe the manufacturer did a poor job. Shrug.

Please compare the above with the values presented in the BMC web
interface:

Name            Status          Reading
System Temp     Normal          41 degrees C
CPU Temp        Normal          42 degrees C
CPU FAN N/A                     Not Present!
SYS FAN N/A                     Not Present!
CPU Vcore       Normal          1.096 Volts
Vichcore        Normal          1.04 Volts
+3.3VCC         Normal          3.328 Volts
VDIMM           Normal          1.528 Volts
+5 V            Normal          5.12 Volts
+12 V           Normal          12.084 Volts
+3.3VSB         Normal          3.28 Volts
VBAT            Normal          3.136 Volts
Chassis Intru                   OK
PS Status                       Presence detected.

Here is from the ipmitool over the network:

$ ipmitool -I lanplus -U musr1 -f .ipmipass -H 10.10.10.10 sdr 
System Temp      | 41 degrees C      | ok
CPU Temp         | 42 degrees C      | ok
CPU FAN          | no reading        | ns
SYS FAN          | no reading        | ns
CPU Vcore        | 1.10 Volts        | ok
Vichcore         | 1.04 Volts        | ok
+3.3VCC          | 3.33 Volts        | ok
VDIMM            | 1.54 Volts        | ok
+5 V             | 5.12 Volts        | ok
+12 V            | 12.08 Volts       | ok
+3.3VSB          | 3.28 Volts        | ok
VBAT             | 3.14 Volts        | ok
Chassis Intru    | 0x00              | ok
PS Status        | 0x00              | ok
$

Same thing with mode details and thresholds (untouched from defaults,
for reference only where the lm(4) sensor may be getting some of the
funny values):

$ ipmitool -I lanplus -U musr1 -f .ipmipass -H 10.10.10.10 sensor 
System Temp      | 42.000     | degrees C  | ok    | -9.000    | -7.000    | 
-5.000    | 75.000    | 77.000    | 79.000    
CPU Temp         | 42.000     | degrees C  | ok    | -11.000   | -8.000    | 
-5.000    | 85.000    | 90.000    | 95.000    
CPU FAN          | na         |            | na    | na        | na        | na 
       | na        | na        | na        
SYS FAN          | na         |            | na    | na        | na        | na 
       | na        | na        | na        
CPU Vcore        | 1.096      | Volts      | ok    | 0.640     | 0.664     | 
0.688     | 1.344     | 1.408     | 1.472     
Vichcore         | 1.040      | Volts      | ok    | 0.808     | 0.824     | 
0.840     | 1.160     | 1.176     | 1.192     
+3.3VCC          | 3.328      | Volts      | ok    | 2.816     | 2.880     | 
2.944     | 3.584     | 3.648     | 3.712     
VDIMM            | 1.528      | Volts      | ok    | 1.312     | 1.328     | 
1.344     | 1.648     | 1.664     | 1.680     
+5 V             | 5.120      | Volts      | ok    | 4.096     | 4.320     | 
4.576     | 5.344     | 5.600     | 5.632     
+12 V            | 12.084     | Volts      | ok    | 10.706    | 10.600    | 
10.494    | 13.091    | 13.197    | 13.303    
+3.3VSB          | 3.280      | Volts      | ok    | 2.816     | 2.880     | 
2.944     | 3.584     | 3.648     | 3.712     
VBAT             | 3.136      | Volts      | ok    | 2.560     | 2.624     | 
2.688     | 3.328     | 3.392     | 3.456     
Chassis Intru    | 0x0        | discrete   | 0x0000| na        | na        | na 
       | na        | na        | na        
PS Status        | 0x0        | discrete   | 0x00ff| na        | na        | na 
       | na        | na        | na        
$

> Sorry, I'm out of ideas. Maybe someone else can debug it for you.

Thanks Mike, kudos for the great work going on in the important areas
and thanks for the follow up and help so far. Now, some more details to
add below to get back on the right track...

> > so on a related note, i'm on the hunt for something which can replace
> > this board's functionality without breaking the bank.

At the other late post in the thread, Supermicro boards always break
the bank, and must work or be returned to manufacturer. This is the
wrong attitude at display here, the original poster and I as well as
others need this fixed our (OpenBSD) end if possible and will continue
chasing it for what it takes. To sum up, if you can't add tech details,
don't add discouragement please.

> > something not supported by
> > supermicro, as this is a brand new board

Spring 2011 actually for my MBD-X7SPA-HF-D525-O bought brand new in
original packing from official retailer carrying Supermicro products:

http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA-HF-D525.cfm

> > and they seem to be unwilling to 
> > provide support anyway.

Sales (marketing) support is different than technical (integrator)
support and has nothing to do with actual engineering (actual technical)
and systems design (r&d) support.

You should have at least been walked through the BIOS R1.2b and
IPMI R3.16 (SMT) upgrades as far as the integrator tech support goes.

That being said, you have to be tolerant to allow them to escalate the
issue internally if you managed to get to score at their support system.

Note: Sadly, my local sales reps are not on the map for escalation to
the internal engineering team, living in the rural European states has
downsides and I'm out of the sales warranty in terms of years passed
since purchase, but the board is still in production meaning being
manufactured and available as stock (global SKUs).

So, the other approach apparently is to somehow reach the Supermicro
people beyond general tech support, and this does not invalidate looking
sideways for additional tech support based on the HW tech spec sheet.

> > remote kvm/power is the sole purpose for choosing this
> > supermicro device in the first place.

Same here, and one of the main price points as well. You (Dewey pls)
have to chase this further, as the system works OK for remote
deployments, once you figure out to disable the lm(4) sensor and don't
get trapped with the watchdog (pun intended).

> > i have plenty much more expensive and
> > more powerful supermicro devices at customer sites which do not show
> > this issue

Different boards mean different hardware spec and another set of issues
known to the R&D team(s). This is not the point here, the fact is what
we bought must work and if it's a software issue, we can fix it one way
or another.

> > - but their non-support of this brand-new motherboard shows me that
> > they
> > are not who i want to be relying on.

Your call, I'm sticking with Supermicro for now compared to the general
consumer class main boards over the years, I've seen much worse in
every aspect and always looking for better (capacitor plague anyone?).

So asking kindly for more help and pointers from the actual sensors
framework OpenBSD devs who may have more than an idea how to fix this
or improve the situation a bit further.

Thanks everyone, and Dewey for raising this up.

P.S. I have memories someone posted pointers for the watchdog enabling
a while back on the mailing lists too (have not got to that yet, since
this reboot long beep issue is more important).

Regards,
Anton

Reply via email to