This doc patch provides an initial description of the HCall op-codes that are used by Linux kernel running as a guest operating system (LPAR) on top of PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).
Apart from documenting the HCalls the doc-patch also provides a rudimentary overview of how Hcalls are implemented inside the Linux kernel and how information flows between kernel and PowerVM/KVM. Signed-off-by: Vaibhav Jain <vaib...@linux.ibm.com> --- Change-log: v5 * First patch in this patchset. --- Documentation/powerpc/hcalls.txt | 140 +++++++++++++++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 Documentation/powerpc/hcalls.txt diff --git a/Documentation/powerpc/hcalls.txt b/Documentation/powerpc/hcalls.txt new file mode 100644 index 000000000000..cc9dd872cecd --- /dev/null +++ b/Documentation/powerpc/hcalls.txt @@ -0,0 +1,140 @@ +Hyper-visor Call Op-codes (HCALLS) +==================================== + +Overview +========= + +Virtualization on PPC64 arch is based on the PAPR specification[1] which +describes run-time environment for a guest operating system and how it should +interact with the hyper-visor for privileged operations. Currently there are two +PAPR compliant hypervisors (PHYP): + +IBM PowerVM: IBM's proprietary hyper-visor that supports AIX, IBM-i and Linux as + supported guests (termed as Logical Partitions or LPARS). + +Qemu/KVM: Supports PPC64 linux guests running on a PPC64 linux host. + +On PPC64 arch a virtualized guest kernel runs in a non-privileged mode (HV=0). +Hence to perform a privileged operations the guest issues a Hyper-visor +Call (HCALL) with necessary input operands. PHYP after performing the privilege +operation returns a status code and output operands back to the guest. + +HCALL ABI +========= +The ABI specification for a HCall between guest os kernel and PHYP is +described in [1]. The Opcode for Hcall is set in R3 and subsequent in-arguments +for the Hcall are provided in registers R4-R12. On return from 'HVCS' +instruction the status code of HCall is available in R3 an the output parameters +are returned in registers R4-R12. + +Powerpc arch code provides convenient wrappers named plpar_hcall_xxx defined in +header 'hvcall.h' to issue HCalls from the linux kernel running as guest. + + +DRC & DRC Indexes +================= + + PAPR Guest + DR1 Hypervisor OS + +--+ +----------+ +---------+ + | |<------>| | | User | + +--+ DRC1 | | DRC | Space | + | | Index +---------+ + DR2 | | | | + +--+ | |<------->| Kernel | + | |<----- >| | HCall | | + +--+ DRC2 +----------+ +---------+ + +PHYP terms shared hardware resources like PCI devices, NVDimms etc available for +use by LPARs as Dynamic Resource (DR). When a DR is allocated to an LPAR, PHYP +creates a data-structure called Dynamic Resource Connector (DRC) to manage LPAR +access. An LPAR refers to a DRC via an opaque 32-bit number called DRC-Index. +The DRC-index value is provided to the LPAR via device-tree where its present +as an attribute in the device tree node associated with the DR. + +HCALL Op-codes +============== + +Below is a partial of of HCALLs that are supported by PHYP. For the +corresponding opcode values please look into the header +'arch/powerpc/include/asm/hvcall.h' : + +* H_SCM_READ_METADATA: + Input: drcIndex, offset, buffer-address, numBytesToRead + Out: None + Description: + Given a DRC Index of an NVDimm, read N-bytes from the the meta data area + associated with it, at a specified offset and copy it to provided buffer. + The metadata area stores configuration information such as label information, + bad-blocks etc. The metadata area is located out-of-band of NVDimm storage + area hence a separate access semantics is provided. + +* H_SCM_WRITE_METADATA: + Input: drcIndex, offset, data, numBytesToWrite + Out: None + Description: + Given a DRC Index of an NVDimm, write N-bytes from provided buffer at the + given offset to the the meta data area associated with the NVDimm. + + +* H_SCM_BIND_MEM: + Input: drcIndex, startingScmBlockIndex, numScmBlocksToBind, targetAddress + Out: guestMappedAddress, numScmBlockBound + Description: + Given a DRC-Index of an NVDimm, maps the SCM (Storage Class Memory) blocks to + continuous logical addresses in guest physical address space. The HCALL + arguments can be used to map partial range of SCM blocks instead of entire + NVDimm range to the LPAR. + +* H_SCM_UNBIND_MEM: + Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind + Out: numScmBlocksUnbound + Description: + Given a DRC-Index of an NVDimm, unmap one or more the SCM blocks from guest + physical address space. The HCALL can fail if the Guest has an active PTE + entry to the SCM block being unbinded. + +* H_SCM_QUERY_BLOCK_MEM_BINDING: + Input: drcIndex, scmBlockIndex + Out: Guest-Physical-Address + Description: + Given a DRC-Index and an SCM Block index return the guest physical address to + which the SCM block is mapped to. + +* H_SCM_QUERY_LOGICAL_MEM_BINDING: + Input: Guest-Physical-Address + Out: drcIndex, scmBlockIndex + Description: + Given a guest physical address return which DRC Index and SCM block is mapped + to that address. + +* H_SCM_UNBIND_ALL: + Input: scmTargetScope, drcIndex + Out: None + Description: + Depending on the Target scope unmap all scm blocks belonging to all NVDimms + or all scm blocks belonging to a single NVDimm identified by its drcIndex + from the LPAR memory. + +* H_SCM_HEALTH: + Input: drcIndex + Output: health-bitmap, health-bit-valid-bitmap + Description: + Given a DRC Index return the info on predictive failure and over all health of + the NVDimm. The asserted bits in the health-bitmap indicate a single predictive + failure and health-bit-valid-bitmap indicate which bits in health-bitmap are + valid. + + +* H_SCM_PERFORMANCE_STATS: + Input: drcIndex, resultBuffer Addr + Out: None + Description: + Given a DRC Index collect the performance statistics for NVDimm and copy them + to the resultBuffer. + + +References +========== +[1]: "Linux on Power Architecture Platform Reference" + https://members.openpowerfoundation.org/document/dl/469 -- 2.21.0