On 11/01/2021 12:46, Jan Kiszka wrote:
> On 11.01.21 12:21, Andrea Bastoni wrote:
>> Hi Jan,
>>
>> On 11/01/2021 08:36, Jan Kiszka wrote:
>>> On 25.11.20 17:12, Jan Kiszka wrote:
>>>> On 25.11.20 16:59, Andrea Bastoni wrote:
>>>>> On 25/11/2020 11:51, Jan Kiszka wrote:
>>>>>> On 25.11.20 10:15, Andrea Bastoni wrote:
>>>>>>> On 25/11/2020 07:07, Jan Kiszka wrote:
>>>>>>>> On 23.11.20 21:45, Andrea Bastoni wrote:
>>>>>>>>> Hello Jan, hi all,
>>>>>>>>>
>>>>>>>>> As discussed a couple of weeks ago, here an updated version of the 
>>>>>>>>> "cache-coloring" for Jailhouse.
>>>>>>>>>
>>>>>>>>> A short recap on what's coloring (see 
>>>>>>>>> Documentation/cache-coloring.md):
>>>>>>>>>> Cache coloring is a software technique that permits LLC partitioning,
>>>>>>>>>> therefore eliminating mutual core interference, and thus 
>>>>>>>>>> guaranteeing more
>>>>>>>>>> predictable performances for memory accesses.
>>>>>>>>>
>>>>>>>>> In this version of the patch, we've tried to keep the changes in the 
>>>>>>>>> hypervisor core minimal. Also, since coloring is something that 
>>>>>>>>> should be transparent to inmates/VM (coloring belongs to the IPA/PA 
>>>>>>>>> hypervisor translations), the Linux driver has been only minimally 
>>>>>>>>> modified.
>>>>>>>>>
>>>>>>>>> Coloring API
>>>>>>>>> ------------
>>>>>>>>>
>>>>>>>>> The patch adds a small architecture API layer that implements 
>>>>>>>>> coloring. The API is fully implemented only on arm64.  arm32 could be 
>>>>>>>>> also a possible target for coloring, x86 already has CAT. Despite 
>>>>>>>>> being implemented by only one architecture, having a small common API 
>>>>>>>>> layer integrates IMHO better than other solutions in the main code 
>>>>>>>>> (without adding conditionally compiled code).
>>>>>>>>>
>>>>>>>>> The API is:
>>>>>>>>>     arch_color_map_memory_region()   // cell_create
>>>>>>>>>         Color-maps the memory of the cell. Coloring requires multiple 
>>>>>>>>> passes to keep a contiguous VA, while "punching holes" in the PA 
>>>>>>>>> according to the allocated colors.
>>>>>>>>>
>>>>>>>>>     arch_color_unmap_memory_region() // cell_destroy
>>>>>>>>>         The opposite of create: search and give back each piece of 
>>>>>>>>> colored PA.
>>>>>>>>>
>>>>>>>>>     arch_color_remap_to_root()       // cell_load
>>>>>>>>>         Given that we don't want to modify the driver (i.e., we don't 
>>>>>>>>> want the driver to do non-contiguous color-ioremap) we let the root 
>>>>>>>>> cell believe that the IPAs it is ioremapping are contiguous, but in 
>>>>>>>>> reality the PA are non-contiguous (colored). This is currently done 
>>>>>>>>> by using a very high VA (IPA) address as "base" for the 
>>>>>>>>> load-remap-to-root-cell operation. This value is configuration 
>>>>>>>>> dependent and can be specified in the root cell to avoid (unlikely) 
>>>>>>>>> conflicts.
>>>>>>>>>
>>>>>>>>>     arch_color_unmap_from_root()     // cell_start
>>>>>>>>>         Returns the mapping setup by the cell_load to the root cell.
>>>>>>>>>
>>>>>>>>> Two APIs are implemented only by arm64:
>>>>>>>>>     arm_color_dcache_flush_memory_region()  // range flushing
>>>>>>>>>         Also flushing should be done in a "colored way".
>>>>>>>>>
>>>>>>>>>     arm_color_init()                        // hook for the 
>>>>>>>>> initialization of coloring
>>>>>>>>>         Reads the parameters (size of the LLC way, base offset for 
>>>>>>>>> the load-remapping operation) needed by coloring. There's also some 
>>>>>>>>> debugging code to autodetect the cache-geometry and determine the 
>>>>>>>>> size of the way.
>>>>>>>>>
>>>>>>>>> Configuration
>>>>>>>>> -------------
>>>>>>>>>
>>>>>>>>> From the configuration point of view, a colored memory region is a 
>>>>>>>>> normal memory region with a color attached. The color is expressed 
>>>>>>>>> directly in the memory region as color-bitmask and the flag 
>>>>>>>>> JAILHOUSE_MEM_COLORED is used to identify such a region. Contiguous 
>>>>>>>>> bits in the bitmask identify a color (region) to be used for the 
>>>>>>>>> memory region. The size of the mapping doesn't change due to coloring.
>>>>>>>>>
>>>>>>>>> For example, with 16 colors, a full way size is selected by 0xffff, 
>>>>>>>>> while 1/4 of the way size is selected by 0x000f.
>>>>>>>>>
>>>>>>>>> (Note: we also experimented with a different version that defined 
>>>>>>>>> separate "normal" and "colored" memory regions. The approach 
>>>>>>>>> presented in this version is more robust.)
>>>>>>>>>
>>>>>>>>> Main coloring "loop"
>>>>>>>>> --------------------
>>>>>>>>>
>>>>>>>>> The coloring APIs basically boil down to a loop that appropriately 
>>>>>>>>> selects --according to the color-- the physical addresses where to 
>>>>>>>>> perform a selected operation. The selection is done efficiently with 
>>>>>>>>> bit operations on the bitmask.
>>>>>>>>>
>>>>>>>>> An alternative approach is to hook coloring deep into the mapping 
>>>>>>>>> functionalities of the hypervisor. Basically, all low level mapping 
>>>>>>>>> functions support coloring, and the non-colored case becomes the 
>>>>>>>>> "special case."
>>>>>>>>> This approach was not followed because to be implemented "cleanly", 
>>>>>>>>> all the architecture mapping APIs have to be extended to support 
>>>>>>>>> coloring, but only one-architecture then really implement it.
>>>>>>>>>
>>>>>>>>> Instead, the implemented approach to have an additional "coloring" 
>>>>>>>>> API integrates more nicely. Additionally, coloring will be hopefully 
>>>>>>>>> only a transitory techniques (some years?) that could be eventually 
>>>>>>>>> replaced by hardware-based techniques (e.g., MPAM). When this 
>>>>>>>>> happens, removing the additional coloring API requires less rework 
>>>>>>>>> than the other approach.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Debugging Code
>>>>>>>>> --------------
>>>>>>>>>
>>>>>>>>> The patches also provide a "debugging-only" code that can be 
>>>>>>>>> activated by defining CONFIG_DEBUG (it sounded more 
>>>>>>>>> Jailhouse-oriented than NDEBUG). In debugging mode, the coloring 
>>>>>>>>> provides an autodetection for arm64 cache geometry.
>>>>>>>>> Additionally, I've added an assert() function that can be used at 
>>>>>>>>> runtime in debugging only contexts (could be used independently from 
>>>>>>>>> coloring).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Feedback and comments welcome.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Andrea
>>>>>>>>>
>>>>>>>>> Andrea Bastoni (12):
>>>>>>>>>   arm-common: bitops: add most-significant-bit operation
>>>>>>>>>   hypervisor, driver, archs: add basic empty infrastructure for 
>>>>>>>>> coloring
>>>>>>>>>   hypervisor: arm64: add cache coloring implementation
>>>>>>>>>   hypervisor configuration: hook autodetection for coloring_way_size
>>>>>>>>>   configs: arm64: add example configuration for colored ZCU102 inmate
>>>>>>>>>   hypervisor: protect inclusion of control.h
>>>>>>>>>   hypervisor, driver: add platform information to configure coloring
>>>>>>>>>     params
>>>>>>>>>   configs: arm64: hook-in coloring parameters for ZCU102
>>>>>>>>>   hypervisor: provide runtime assert() helper for DEBUG only
>>>>>>>>>   hypervisor: provide implementation for __assert_fail() and group
>>>>>>>>>     panic-related functions
>>>>>>>>>   hypervisor: coloring: use assert instead of "BUG"
>>>>>>>>>   hypervisor: coloring: make cache autodetection debug-only
>>>>>>>>>
>>>>>>>>> Luca Miccio (2):
>>>>>>>>>   Documentation: add description and usage of cache coloring support
>>>>>>>>>   pyjailhouse: add support for colored regions
>>>>>>>>>
>>>>>>>>>  Documentation/cache-coloring.md               | 198 
>>>>>>>>> ++++++++++++++++++
>>>>>>>>>  configs/arm64/zynqmp-zcu102-inmate-demo-col.c |  75 +++++++
>>>>>>>>>  configs/arm64/zynqmp-zcu102.c                 |   6 +
>>>>>>>>>  driver/cell.c                                 |  10 +-
>>>>>>>>>  driver/cell.h                                 |   1 +
>>>>>>>>>  driver/main.c                                 |  12 ++
>>>>>>>>>  hypervisor/Makefile                           |   2 +-
>>>>>>>>>  hypervisor/arch/arm-common/control.c          |   1 +
>>>>>>>>>  .../arch/arm-common/include/asm/bitops.h      |  10 +
>>>>>>>>>  .../arch/arm-common/include/asm/dcaches.h     |   8 +
>>>>>>>>>  hypervisor/arch/arm-common/mmu_cell.c         |  52 +++--
>>>>>>>>>  hypervisor/arch/arm/control.c                 |   1 +
>>>>>>>>>  hypervisor/arch/arm/include/asm/coloring.h    |  59 ++++++
>>>>>>>>>  hypervisor/arch/arm/traps.c                   |   1 +
>>>>>>>>>  hypervisor/arch/arm64/Kbuild                  |   5 +
>>>>>>>>>  hypervisor/arch/arm64/cache_layout.c          | 148 +++++++++++++
>>>>>>>>>  hypervisor/arch/arm64/coloring.c              | 184 ++++++++++++++++
>>>>>>>>>  hypervisor/arch/arm64/control.c               |   1 +
>>>>>>>>>  .../arch/arm64/include/asm/cache_layout.h     |  21 ++
>>>>>>>>>  hypervisor/arch/arm64/include/asm/coloring.h  | 137 ++++++++++++
>>>>>>>>>  hypervisor/arch/arm64/setup.c                 |   3 +
>>>>>>>>>  hypervisor/arch/arm64/traps.c                 |   1 +
>>>>>>>>>  hypervisor/arch/x86/amd_iommu.c               |   1 +
>>>>>>>>>  hypervisor/arch/x86/control.c                 |   1 +
>>>>>>>>>  hypervisor/arch/x86/efifb.c                   |   1 +
>>>>>>>>>  hypervisor/arch/x86/include/asm/coloring.h    |  45 ++++
>>>>>>>>>  hypervisor/arch/x86/ioapic.c                  |   1 +
>>>>>>>>>  hypervisor/arch/x86/svm.c                     |   1 +
>>>>>>>>>  hypervisor/arch/x86/vmx.c                     |   1 +
>>>>>>>>>  hypervisor/control.c                          | 111 ++++------
>>>>>>>>>  hypervisor/include/jailhouse/assert.h         |  37 ++++
>>>>>>>>>  hypervisor/include/jailhouse/control.h        |  34 +--
>>>>>>>>>  hypervisor/include/jailhouse/panic.h          |  40 ++++
>>>>>>>>>  hypervisor/include/jailhouse/printk.h         |   4 +
>>>>>>>>>  hypervisor/panic.c                            |  86 ++++++++
>>>>>>>>>  hypervisor/pci.c                              |   1 +
>>>>>>>>>  hypervisor/printk.c                           |   1 +
>>>>>>>>>  hypervisor/uart.c                             |   1 +
>>>>>>>>>  include/jailhouse/cell-config.h               |  11 +
>>>>>>>>>  pyjailhouse/config_parser.py                  |   9 +-
>>>>>>>>>  40 files changed, 1201 insertions(+), 121 deletions(-)
>>>>>>>>>  create mode 100644 Documentation/cache-coloring.md
>>>>>>>>>  create mode 100644 configs/arm64/zynqmp-zcu102-inmate-demo-col.c
>>>>>>>>>  create mode 100644 hypervisor/arch/arm/include/asm/coloring.h
>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/cache_layout.c
>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/coloring.c
>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/include/asm/cache_layout.h
>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/include/asm/coloring.h
>>>>>>>>>  create mode 100644 hypervisor/arch/x86/include/asm/coloring.h
>>>>>>>>>  create mode 100644 hypervisor/include/jailhouse/assert.h
>>>>>>>>>  create mode 100644 hypervisor/include/jailhouse/panic.h
>>>>>>>>>  create mode 100644 hypervisor/panic.c
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for the update! I assume it's functional and can be used in
>>>>>>>> tests? Asking as there seems to be a lot of debug code and comments.
>>>>>>>
>>>>>>> Yes, it is functional and it can be used for testing. I assumed this 
>>>>>>> wasn't the
>>>>>>> final version, and the debug code + commented out code are still there 
>>>>>>> in the
>>>>>>> places where it can be useful to take a look to understand how the 
>>>>>>> machinery is
>>>>>>> working.
>>>>>>>
>>>>>>> I'm also interested in your point of view regarding having a 
>>>>>>> CONFIG_DEBUG option
>>>>>>> in Jailhouse. Basically a "debugging/development-enabled" version (with 
>>>>>>> assert()
>>>>>>> and extra verbose output), and a "production" one.
>>>>>>>
>>>>>>>> Scanning through it, I still find it very arm64-centric, rather than
>>>>>>>
>>>>>>> It is. As mentioned, I don't know if it is meaningful to have a x86 
>>>>>>> version of
>>>>>>> the coloring, and for how long there will be the need to have coloring 
>>>>>>> in
>>>>>>> software. So, the patches try to keep the changes in both hypervisor 
>>>>>>> and driver
>>>>>>> to a minimum.
>>>>>>
>>>>>> x86 might not be meaningful for modern CPUs with CAT, but it is another
>>>>>> test case. But I'm also thinking of RISC-V and what else might come 
>>>>>> along.
>>>>>
>>>>> Coloring in software on x86 might not be trivial. But yes, RISC-V could 
>>>>> be a
>>>>> test case, and maybe there will be some hardware support that can be 
>>>>> exploited
>>>>> there...
>>>>>
>>>>>>>> generically addressing the topic as I pointed out in previous reviews.
>>>>>>>> And keeping modifications of the driver minimal is not necessarily a
>>>>>>>> sign we are on the same page already.
>>>>>>>
>>>>>>> We've experimented also with a version that does "more" in the driver. 
>>>>>>> I think
>>>>>>> that more changes in the driver only add complexity. In the end the 
>>>>>>> hypervisor
>>>>>>> have to distinguish anyway between colored and non-colored mapping, and
>>>>>>> scattering the load logic between driver and hypervisor has more 
>>>>>>> maintenance
>>>>>>> effort than a single clean configuration parameter.
>>>>>>
>>>>>> You are possibly right, it's just hard for me to follow this as there is
>>>>>> no code or more concrete design to prove that argument.
>>>>>
>>>>> I've synched with Luca and Marco, and I've pushed here
>>>>>
>>>>> https://gitlab.com/bastoni/jailhouse/-/commits/wip/design/coloring-driver/
>>>>>
>>>>> some relevant commits that show how a possible driver-integrated solution 
>>>>> could
>>>>> look like.
>>>>>
>>>>> Note that the "jailhouse_ioremap_col()" function in driver/main.c 
>>>>> basically
>>>>> replicates the same coloring loop that should also be provided by the 
>>>>> hypervisor
>>>>> to do colored-mapping and how the logic about the "next_colored()" should 
>>>>> be
>>>>> "moved-up" to be shared between HV and driver.
>>>>
>>>> Need to look into that, so a blind shot: If it's mere replication of
>>>> logic, a common include with the core functionality can help to avoid
>>>> duplication. If it's more, this is a point.
>>>>
>>>
>>> Finally had the time to study: I don't see a major issue with the
>>
>> Thank you for checking this.
>>
>>> jailhouse_ioremap_col replicating the mapping logic the hypervisor does.
>>> The core logic is in next_colored, and that is shared.
>>> jailhouse_ioremap_col is a simple loop.
>>>
>>> However, jailhouse_ioremap_col should be enhanced to a generic
>>> jailhouse_ioremap that maps also uncolored memory the same way. To my
>>> understanding, next_colored already returns the next physical page in
>>> the uncolored case.
>>>
>>>>>
>>>>> This non-clean separation, and the fact that the IPA/PA mapping should not
>>>>> belong to the driver, but only to the HV are among the reasons why we 
>>>>> moved to
>>>>> the solution presented in the patch series.
>>>>
>>>> If code overall grows massively, it's valid to consolidate in the HV. If
>>>> it's just about having it somewhere, the driver is the better place.
>>>>
>>>
>>> Would be interesting to see the diffstat of the hypervisor with your
>>> driver coloring extension. The cache layout detection leaves to
>>> hypervisor, and no new hypercall is added. Should be worth it.
>>
>> What do you mean here? In the version posted in the patches (the most updated
>> version) there is no hypercall. Do you have in mind a cross-over between the 
>> two
>> versions?
>>
>> What is the objective you would like to achieve from a design point of view?
>>
> 
> Right, there is no special call anymore. I meant the special handling of
> colored regions for set-loadable, though. That can be avoided.

The cost of avoiding it, is to make Linux running in the root-cell aware that a
mapping is colored, and have something similar to the loops in "color_cell_op()"
also in the ioremap_colored() in Linux.

(In the wip/design/coloring-driver version, this loop complexity is hidden in
next_colored() + next_color(), but the basic complexity is the same.)

To me, coloring should just effect the IPA - PA translation, and except for the
configuration dependent offset, the coloring can be kept transparent for Linux.

> The hypervisor then just needs to apply generic coloring rules when
> mapping regions into cells. And the root cell will request the loadable
> region by adding the non-root cell's color to its respective region.
> 
> Jan
> 

-- 
Thanks,
Andrea Bastoni

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/fc33efe0-deec-95cf-1fda-7f03dabac6fa%40tum.de.

Reply via email to