On 3/29/2017 5:41 PM, [email protected] wrote:
Dell - Internal Use - Confidential
-----Original Message-----
From: Dan Williams [mailto:[email protected]]
Sent: Wednesday, March 29, 2017 1:13 PM
To: Lijun Pan <[email protected]>
Cc: [email protected]; Pan, Lijun <[email protected]>; Hayes,
Stuart <[email protected]>
Subject: Re: [PATCH v2] libndctl: add support for the MSFT family of DSM
functions
On Sun, Mar 26, 2017 at 1:17 PM, Lijun Pan <[email protected]> wrote:
From: Lijun Pan <[email protected]>
This patch retrieves the health data from NVDIMM-N via
the MSFT _DSM function[1], following JESD245A[2] standards.
Now 'ndctl list --dimms --health --idle' could work
on MSFT type NVDIMM-N, but limited to health_state,
temperature_celsius, and life_used_percentage.
Looks good, can you add sample output of:
ndctl list --dimms --health --idle
Dan,
Do you want me to put this sample output in the v3 patch's commit message?
Output is something like,
"health":{
"health_state":"ok",
"temperature_celsius":193.750000,
Is that a real example from a running system?
If so, something is clearly wrong.
"life_used_percentage":1
}
If the BIOS returns a value say 45.75, how can it be represented in (__u16)
instead of using a 'double' type? I think the ((__u16) temp & 0x0FFC) already
represents a temperature in 1/16 Celsius granularity. No need to multiple it by
16 again.
Below I quote the section 7.8 of JESD245.pdf
Why are you quoting that spec? It doesn't matter what the JEDEC
spec says. What matters is what the Microsoft DSM spec says.
The Microsoft DSM spec says that the temperature is returned in degrees C.
If you're a FW developer, then you care very much what the JEDEC
spec says but the FW should convert the temperature to the units
that the DSM says it returns to the OS, which is degrees C.
Bit 3 has a resolution of 1/2 Celsius,
Bit 2 has a resolution of 1/4 Celsius,
Bit 1 has 1/8 Celsius, but is reserved, so I assume it be zero,
Bit 0 has 1/16 Celsius, but is reserved, so I assume it be zero.
((__u16) temp & 0x0FFC) will only get the bit11 - bit 0. This presents the
1/16 Celsius resolution, kind of left shifted 4 bits.
So we don't need to do
'return CMD_MSFT_SMART(cmd)->temp * 16;' again.
You're looking at the wrong spec. This is what you should
be looking at.
https://msdn.microsoft.com/en-us/library/windows/hardware/mt604717(v=vs.85).aspx
-- ljk
==== = quotation starts =====
"
Temperature measurement shall have a minimum resolution of 0.25 Celsius.
Registers containing measured temperature value shall be 16-bits and report
temperature as a 10-bit value based on the following definition.
Table 3: Temperature value bit definition
Bit15 Bit14 Bit13 Bit12 Bit11 Bit10 Bit9 Bit8
Reserved Reserved Reserved Reserved 128 64
32 16
Bit7 Bit6 Bit5 Bit4 Bit3 Bit2 Bit1 Bit0
8 4 2 1 0.5 0.25 Reserved Reserved
Examples:
A temperature value of 10.5 Celsius is represented as 0000000010101000b.
A temperature value of 64.75 Celsius is represented as 0000010000001100b.
"
==== quotation ends ========
Lijun
...so users know what to expect. With that change and addressing
Linda's comment about the temperature multiplier I think we're good to
go.
Also, if you want to add Microsoft-only health attributes we'll need
to add new "valid" flags beyond the current ND_SMART_*_VALID set. If
this goes beyond the current 32 "valid" flags that that
ndctl_cmd_smart_get_flags() returns we might need a new
ndctl_cmd_smart_get_flags2() call that returns an arbitrary bitmap of
valid flags.
We'd also need to move those definitions to an ndctl local header.
Currently where they are defined now in ndctl.h is owned by the
kernel. We can cross this bridge later in a follow-on patch.
I will do it on a follow-up patch later.
Lijun
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm
_______________________________________________
Linux-nvdimm mailing list
[email protected]
https://lists.01.org/mailman/listinfo/linux-nvdimm