Michael Ellerman <micha...@au1.ibm.com> writes: > Vaibhav Jain <vaib...@linux.ibm.com> writes: >> Thanks for reviewing this this patch Ira. My responses below: >> Ira Weiny <ira.we...@intel.com> writes: >>> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote: >>>> Implement support for fetching nvdimm health information via >>>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair >>>> of 64-bit big-endian integers, bitwise-and of which is then stored in >>>> 'struct papr_scm_priv' and subsequently partially exposed to >>>> user-space via newly introduced dimm specific attribute >>>> 'papr/flags'. Since the hcall is costly, the health information is >>>> cached and only re-queried, 60s after the previous successful hcall. > ... >>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c >>>> b/arch/powerpc/platforms/pseries/papr_scm.c >>>> index f35592423380..142636e1a59f 100644 >>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c >>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c >>>> @@ -39,6 +78,15 @@ struct papr_scm_priv { >>>> struct resource res; >>>> struct nd_region *region; >>>> struct nd_interleave_set nd_set; >>>> + >>>> + /* Protect dimm health data from concurrent read/writes */ >>>> + struct mutex health_mutex; >>>> + >>>> + /* Last time the health information of the dimm was updated */ >>>> + unsigned long lasthealth_jiffies; >>>> + >>>> + /* Health information for the dimm */ >>>> + u64 health_bitmap; >>> >>> I wonder if this should be typed big endian as you mention that it is in the >>> commit message? >> This was discussed in an earlier review of the patch series at >> https://lore.kernel.org/linux-nvdimm/878sjetcis....@mpe.ellerman.id.au >> >> Even though health bitmap is returned in big endian format (For ex >> value 0xC00000000000000 indicates bits 0,1 set), its value is never >> used. Instead only test for specific bits being set in the register is >> done. > > This has already caused a lot of confusion, so let me try and clear it > up. I will probably fail :) > > The value is not big endian. > > It's returned in a GPR (a register), from the hypervisor. The ordering > of bytes in a register is not dependent on what endian we're executing > in. > > It's true that the hypervisor will have been running big endian, and > when it returns to us we will now be running little endian. But the > value is unchanged, it was 0xC00000000000000 in the GPR while the HV was > running and it's still 0xC00000000000000 when we return to Linux. You > can see this in mambo, see below for an example. > > > _However_, the specification of the bits in the bitmap value uses MSB 0 > ordering, as is traditional for IBM documentation. That means the most > significant bit, aka. the left most bit, is called "bit 0". > > See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering > > That is the opposite numbering from what most people use, and in > particular what most code in Linux uses, which is that bit 0 is the > least significant bit. > > Which is where the confusion comes in. It's not that the bytes are > returned in a different order, it's that the bits are numbered > differently in the IBM documentation. > > The way to fix this kind of thing is to read the docs, and convert all > the bits into correct numbering (LSB=0), and then throw away the docs ;) > > cheers Thanks a lot for clarifying this Mpe and for this detailed explaination.
I have removed the term Big-Endian from v8 patch description to avoid any further confusion. > > > > In mambo we can set a breakpoint just before the kernel enters skiboot, > towards the end of __opal_call. The kernel is running LE and skiboot > runs BE. > > systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc0000000000c1744 > breakpoint set at [0:0:0]: 0xc0000000000c1744 (0x00000000000C1744) > Enc:0x2402004C : hrfid > > Then run: > > systemsim-p9 [~/skiboot/skiboot/external/mambo] c > [0:0:0]: 0xC0000000000C1744 (0x00000000000C1744) Enc:0x2402004C : hrfid > INFO: 121671618: (121671618): ** Execution stopped: user (tcl), ** > 121671618: ** finished running 121671618 instructions ** > > And we stop there, on an hrfid that we haven't executed yet. > We can print r0, to see the OPAL token: > > systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0 > 0x0000000000000019 > > ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25). > > And we can print the MSR: > > systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr > 0x9000000002001033 > > 64-bit mode (SF): 0x1 [64-bit mode] > Hypervisor State (HV): 0x1 > Vector Available (VEC): 0x1 > Machine Check Interrupt Enable (ME): 0x1 > Instruction Relocate (IR): 0x1 > Data Relocate (DR): 0x1 > Recoverable Interrupt (RI): 0x1 > Little-Endian Mode (LE): 0x1 [little-endian] > > ie. we're little endian. > > We then step one instruction: > > systemsim-p9 [~/skiboot/skiboot/external/mambo] s > [0:0:0]: 0x0000000030002BF0 (0x0000000030002BF0) Enc:0x7D9FFAA6 : mfspr > r12,PIR > > Now we're in skiboot. Print the MSR again: > > systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr > 0x9000000002001002 > > 64-bit mode (SF): 0x1 [64-bit mode] > Hypervisor State (HV): 0x1 > Vector Available (VEC): 0x1 > Machine Check Interrupt Enable (ME): 0x1 > Recoverable Interrupt (RI): 0x1 > > We're big endian. > Print r0: > > systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0 > 0x0000000000000019 > > r0 is unchanged! Got it. Thanks again. -- Cheers ~ Vaibhav