On 11.01.21 13:33, Andrea Bastoni wrote:
> On 11/01/2021 12:46, Jan Kiszka wrote:
>> On 11.01.21 12:21, Andrea Bastoni wrote:
>>> Hi Jan,
>>>
>>> On 11/01/2021 08:36, Jan Kiszka wrote:
>>>> On 25.11.20 17:12, Jan Kiszka wrote:
>>>>> On 25.11.20 16:59, Andrea Bastoni wrote:
>>>>>> On 25/11/2020 11:51, Jan Kiszka wrote:
>>>>>>> On 25.11.20 10:15, Andrea Bastoni wrote:
>>>>>>>> On 25/11/2020 07:07, Jan Kiszka wrote:
>>>>>>>>> On 23.11.20 21:45, Andrea Bastoni wrote:
>>>>>>>>>> Hello Jan, hi all,
>>>>>>>>>>
>>>>>>>>>> As discussed a couple of weeks ago, here an updated version of the 
>>>>>>>>>> "cache-coloring" for Jailhouse.
>>>>>>>>>>
>>>>>>>>>> A short recap on what's coloring (see 
>>>>>>>>>> Documentation/cache-coloring.md):
>>>>>>>>>>> Cache coloring is a software technique that permits LLC 
>>>>>>>>>>> partitioning,
>>>>>>>>>>> therefore eliminating mutual core interference, and thus 
>>>>>>>>>>> guaranteeing more
>>>>>>>>>>> predictable performances for memory accesses.
>>>>>>>>>>
>>>>>>>>>> In this version of the patch, we've tried to keep the changes in the 
>>>>>>>>>> hypervisor core minimal. Also, since coloring is something that 
>>>>>>>>>> should be transparent to inmates/VM (coloring belongs to the IPA/PA 
>>>>>>>>>> hypervisor translations), the Linux driver has been only minimally 
>>>>>>>>>> modified.
>>>>>>>>>>
>>>>>>>>>> Coloring API
>>>>>>>>>> ------------
>>>>>>>>>>
>>>>>>>>>> The patch adds a small architecture API layer that implements 
>>>>>>>>>> coloring. The API is fully implemented only on arm64.  arm32 could 
>>>>>>>>>> be also a possible target for coloring, x86 already has CAT. Despite 
>>>>>>>>>> being implemented by only one architecture, having a small common 
>>>>>>>>>> API layer integrates IMHO better than other solutions in the main 
>>>>>>>>>> code (without adding conditionally compiled code).
>>>>>>>>>>
>>>>>>>>>> The API is:
>>>>>>>>>>     arch_color_map_memory_region()   // cell_create
>>>>>>>>>>         Color-maps the memory of the cell. Coloring requires 
>>>>>>>>>> multiple passes to keep a contiguous VA, while "punching holes" in 
>>>>>>>>>> the PA according to the allocated colors.
>>>>>>>>>>
>>>>>>>>>>     arch_color_unmap_memory_region() // cell_destroy
>>>>>>>>>>         The opposite of create: search and give back each piece of 
>>>>>>>>>> colored PA.
>>>>>>>>>>
>>>>>>>>>>     arch_color_remap_to_root()       // cell_load
>>>>>>>>>>         Given that we don't want to modify the driver (i.e., we 
>>>>>>>>>> don't want the driver to do non-contiguous color-ioremap) we let the 
>>>>>>>>>> root cell believe that the IPAs it is ioremapping are contiguous, 
>>>>>>>>>> but in reality the PA are non-contiguous (colored). This is 
>>>>>>>>>> currently done by using a very high VA (IPA) address as "base" for 
>>>>>>>>>> the load-remap-to-root-cell operation. This value is configuration 
>>>>>>>>>> dependent and can be specified in the root cell to avoid (unlikely) 
>>>>>>>>>> conflicts.
>>>>>>>>>>
>>>>>>>>>>     arch_color_unmap_from_root()     // cell_start
>>>>>>>>>>         Returns the mapping setup by the cell_load to the root cell.
>>>>>>>>>>
>>>>>>>>>> Two APIs are implemented only by arm64:
>>>>>>>>>>     arm_color_dcache_flush_memory_region()  // range flushing
>>>>>>>>>>         Also flushing should be done in a "colored way".
>>>>>>>>>>
>>>>>>>>>>     arm_color_init()                        // hook for the 
>>>>>>>>>> initialization of coloring
>>>>>>>>>>         Reads the parameters (size of the LLC way, base offset for 
>>>>>>>>>> the load-remapping operation) needed by coloring. There's also some 
>>>>>>>>>> debugging code to autodetect the cache-geometry and determine the 
>>>>>>>>>> size of the way.
>>>>>>>>>>
>>>>>>>>>> Configuration
>>>>>>>>>> -------------
>>>>>>>>>>
>>>>>>>>>> From the configuration point of view, a colored memory region is a 
>>>>>>>>>> normal memory region with a color attached. The color is expressed 
>>>>>>>>>> directly in the memory region as color-bitmask and the flag 
>>>>>>>>>> JAILHOUSE_MEM_COLORED is used to identify such a region. Contiguous 
>>>>>>>>>> bits in the bitmask identify a color (region) to be used for the 
>>>>>>>>>> memory region. The size of the mapping doesn't change due to 
>>>>>>>>>> coloring.
>>>>>>>>>>
>>>>>>>>>> For example, with 16 colors, a full way size is selected by 0xffff, 
>>>>>>>>>> while 1/4 of the way size is selected by 0x000f.
>>>>>>>>>>
>>>>>>>>>> (Note: we also experimented with a different version that defined 
>>>>>>>>>> separate "normal" and "colored" memory regions. The approach 
>>>>>>>>>> presented in this version is more robust.)
>>>>>>>>>>
>>>>>>>>>> Main coloring "loop"
>>>>>>>>>> --------------------
>>>>>>>>>>
>>>>>>>>>> The coloring APIs basically boil down to a loop that appropriately 
>>>>>>>>>> selects --according to the color-- the physical addresses where to 
>>>>>>>>>> perform a selected operation. The selection is done efficiently with 
>>>>>>>>>> bit operations on the bitmask.
>>>>>>>>>>
>>>>>>>>>> An alternative approach is to hook coloring deep into the mapping 
>>>>>>>>>> functionalities of the hypervisor. Basically, all low level mapping 
>>>>>>>>>> functions support coloring, and the non-colored case becomes the 
>>>>>>>>>> "special case."
>>>>>>>>>> This approach was not followed because to be implemented "cleanly", 
>>>>>>>>>> all the architecture mapping APIs have to be extended to support 
>>>>>>>>>> coloring, but only one-architecture then really implement it.
>>>>>>>>>>
>>>>>>>>>> Instead, the implemented approach to have an additional "coloring" 
>>>>>>>>>> API integrates more nicely. Additionally, coloring will be hopefully 
>>>>>>>>>> only a transitory techniques (some years?) that could be eventually 
>>>>>>>>>> replaced by hardware-based techniques (e.g., MPAM). When this 
>>>>>>>>>> happens, removing the additional coloring API requires less rework 
>>>>>>>>>> than the other approach.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Debugging Code
>>>>>>>>>> --------------
>>>>>>>>>>
>>>>>>>>>> The patches also provide a "debugging-only" code that can be 
>>>>>>>>>> activated by defining CONFIG_DEBUG (it sounded more 
>>>>>>>>>> Jailhouse-oriented than NDEBUG). In debugging mode, the coloring 
>>>>>>>>>> provides an autodetection for arm64 cache geometry.
>>>>>>>>>> Additionally, I've added an assert() function that can be used at 
>>>>>>>>>> runtime in debugging only contexts (could be used independently from 
>>>>>>>>>> coloring).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Feedback and comments welcome.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Andrea
>>>>>>>>>>
>>>>>>>>>> Andrea Bastoni (12):
>>>>>>>>>>   arm-common: bitops: add most-significant-bit operation
>>>>>>>>>>   hypervisor, driver, archs: add basic empty infrastructure for 
>>>>>>>>>> coloring
>>>>>>>>>>   hypervisor: arm64: add cache coloring implementation
>>>>>>>>>>   hypervisor configuration: hook autodetection for coloring_way_size
>>>>>>>>>>   configs: arm64: add example configuration for colored ZCU102 inmate
>>>>>>>>>>   hypervisor: protect inclusion of control.h
>>>>>>>>>>   hypervisor, driver: add platform information to configure coloring
>>>>>>>>>>     params
>>>>>>>>>>   configs: arm64: hook-in coloring parameters for ZCU102
>>>>>>>>>>   hypervisor: provide runtime assert() helper for DEBUG only
>>>>>>>>>>   hypervisor: provide implementation for __assert_fail() and group
>>>>>>>>>>     panic-related functions
>>>>>>>>>>   hypervisor: coloring: use assert instead of "BUG"
>>>>>>>>>>   hypervisor: coloring: make cache autodetection debug-only
>>>>>>>>>>
>>>>>>>>>> Luca Miccio (2):
>>>>>>>>>>   Documentation: add description and usage of cache coloring support
>>>>>>>>>>   pyjailhouse: add support for colored regions
>>>>>>>>>>
>>>>>>>>>>  Documentation/cache-coloring.md               | 198 
>>>>>>>>>> ++++++++++++++++++
>>>>>>>>>>  configs/arm64/zynqmp-zcu102-inmate-demo-col.c |  75 +++++++
>>>>>>>>>>  configs/arm64/zynqmp-zcu102.c                 |   6 +
>>>>>>>>>>  driver/cell.c                                 |  10 +-
>>>>>>>>>>  driver/cell.h                                 |   1 +
>>>>>>>>>>  driver/main.c                                 |  12 ++
>>>>>>>>>>  hypervisor/Makefile                           |   2 +-
>>>>>>>>>>  hypervisor/arch/arm-common/control.c          |   1 +
>>>>>>>>>>  .../arch/arm-common/include/asm/bitops.h      |  10 +
>>>>>>>>>>  .../arch/arm-common/include/asm/dcaches.h     |   8 +
>>>>>>>>>>  hypervisor/arch/arm-common/mmu_cell.c         |  52 +++--
>>>>>>>>>>  hypervisor/arch/arm/control.c                 |   1 +
>>>>>>>>>>  hypervisor/arch/arm/include/asm/coloring.h    |  59 ++++++
>>>>>>>>>>  hypervisor/arch/arm/traps.c                   |   1 +
>>>>>>>>>>  hypervisor/arch/arm64/Kbuild                  |   5 +
>>>>>>>>>>  hypervisor/arch/arm64/cache_layout.c          | 148 +++++++++++++
>>>>>>>>>>  hypervisor/arch/arm64/coloring.c              | 184 ++++++++++++++++
>>>>>>>>>>  hypervisor/arch/arm64/control.c               |   1 +
>>>>>>>>>>  .../arch/arm64/include/asm/cache_layout.h     |  21 ++
>>>>>>>>>>  hypervisor/arch/arm64/include/asm/coloring.h  | 137 ++++++++++++
>>>>>>>>>>  hypervisor/arch/arm64/setup.c                 |   3 +
>>>>>>>>>>  hypervisor/arch/arm64/traps.c                 |   1 +
>>>>>>>>>>  hypervisor/arch/x86/amd_iommu.c               |   1 +
>>>>>>>>>>  hypervisor/arch/x86/control.c                 |   1 +
>>>>>>>>>>  hypervisor/arch/x86/efifb.c                   |   1 +
>>>>>>>>>>  hypervisor/arch/x86/include/asm/coloring.h    |  45 ++++
>>>>>>>>>>  hypervisor/arch/x86/ioapic.c                  |   1 +
>>>>>>>>>>  hypervisor/arch/x86/svm.c                     |   1 +
>>>>>>>>>>  hypervisor/arch/x86/vmx.c                     |   1 +
>>>>>>>>>>  hypervisor/control.c                          | 111 ++++------
>>>>>>>>>>  hypervisor/include/jailhouse/assert.h         |  37 ++++
>>>>>>>>>>  hypervisor/include/jailhouse/control.h        |  34 +--
>>>>>>>>>>  hypervisor/include/jailhouse/panic.h          |  40 ++++
>>>>>>>>>>  hypervisor/include/jailhouse/printk.h         |   4 +
>>>>>>>>>>  hypervisor/panic.c                            |  86 ++++++++
>>>>>>>>>>  hypervisor/pci.c                              |   1 +
>>>>>>>>>>  hypervisor/printk.c                           |   1 +
>>>>>>>>>>  hypervisor/uart.c                             |   1 +
>>>>>>>>>>  include/jailhouse/cell-config.h               |  11 +
>>>>>>>>>>  pyjailhouse/config_parser.py                  |   9 +-
>>>>>>>>>>  40 files changed, 1201 insertions(+), 121 deletions(-)
>>>>>>>>>>  create mode 100644 Documentation/cache-coloring.md
>>>>>>>>>>  create mode 100644 configs/arm64/zynqmp-zcu102-inmate-demo-col.c
>>>>>>>>>>  create mode 100644 hypervisor/arch/arm/include/asm/coloring.h
>>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/cache_layout.c
>>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/coloring.c
>>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/include/asm/cache_layout.h
>>>>>>>>>>  create mode 100644 hypervisor/arch/arm64/include/asm/coloring.h
>>>>>>>>>>  create mode 100644 hypervisor/arch/x86/include/asm/coloring.h
>>>>>>>>>>  create mode 100644 hypervisor/include/jailhouse/assert.h
>>>>>>>>>>  create mode 100644 hypervisor/include/jailhouse/panic.h
>>>>>>>>>>  create mode 100644 hypervisor/panic.c
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for the update! I assume it's functional and can be used in
>>>>>>>>> tests? Asking as there seems to be a lot of debug code and comments.
>>>>>>>>
>>>>>>>> Yes, it is functional and it can be used for testing. I assumed this 
>>>>>>>> wasn't the
>>>>>>>> final version, and the debug code + commented out code are still there 
>>>>>>>> in the
>>>>>>>> places where it can be useful to take a look to understand how the 
>>>>>>>> machinery is
>>>>>>>> working.
>>>>>>>>
>>>>>>>> I'm also interested in your point of view regarding having a 
>>>>>>>> CONFIG_DEBUG option
>>>>>>>> in Jailhouse. Basically a "debugging/development-enabled" version 
>>>>>>>> (with assert()
>>>>>>>> and extra verbose output), and a "production" one.
>>>>>>>>
>>>>>>>>> Scanning through it, I still find it very arm64-centric, rather than
>>>>>>>>
>>>>>>>> It is. As mentioned, I don't know if it is meaningful to have a x86 
>>>>>>>> version of
>>>>>>>> the coloring, and for how long there will be the need to have coloring 
>>>>>>>> in
>>>>>>>> software. So, the patches try to keep the changes in both hypervisor 
>>>>>>>> and driver
>>>>>>>> to a minimum.
>>>>>>>
>>>>>>> x86 might not be meaningful for modern CPUs with CAT, but it is another
>>>>>>> test case. But I'm also thinking of RISC-V and what else might come 
>>>>>>> along.
>>>>>>
>>>>>> Coloring in software on x86 might not be trivial. But yes, RISC-V could 
>>>>>> be a
>>>>>> test case, and maybe there will be some hardware support that can be 
>>>>>> exploited
>>>>>> there...
>>>>>>
>>>>>>>>> generically addressing the topic as I pointed out in previous reviews.
>>>>>>>>> And keeping modifications of the driver minimal is not necessarily a
>>>>>>>>> sign we are on the same page already.
>>>>>>>>
>>>>>>>> We've experimented also with a version that does "more" in the driver. 
>>>>>>>> I think
>>>>>>>> that more changes in the driver only add complexity. In the end the 
>>>>>>>> hypervisor
>>>>>>>> have to distinguish anyway between colored and non-colored mapping, and
>>>>>>>> scattering the load logic between driver and hypervisor has more 
>>>>>>>> maintenance
>>>>>>>> effort than a single clean configuration parameter.
>>>>>>>
>>>>>>> You are possibly right, it's just hard for me to follow this as there is
>>>>>>> no code or more concrete design to prove that argument.
>>>>>>
>>>>>> I've synched with Luca and Marco, and I've pushed here
>>>>>>
>>>>>> https://gitlab.com/bastoni/jailhouse/-/commits/wip/design/coloring-driver/
>>>>>>
>>>>>> some relevant commits that show how a possible driver-integrated 
>>>>>> solution could
>>>>>> look like.
>>>>>>
>>>>>> Note that the "jailhouse_ioremap_col()" function in driver/main.c 
>>>>>> basically
>>>>>> replicates the same coloring loop that should also be provided by the 
>>>>>> hypervisor
>>>>>> to do colored-mapping and how the logic about the "next_colored()" 
>>>>>> should be
>>>>>> "moved-up" to be shared between HV and driver.
>>>>>
>>>>> Need to look into that, so a blind shot: If it's mere replication of
>>>>> logic, a common include with the core functionality can help to avoid
>>>>> duplication. If it's more, this is a point.
>>>>>
>>>>
>>>> Finally had the time to study: I don't see a major issue with the
>>>
>>> Thank you for checking this.
>>>
>>>> jailhouse_ioremap_col replicating the mapping logic the hypervisor does.
>>>> The core logic is in next_colored, and that is shared.
>>>> jailhouse_ioremap_col is a simple loop.
>>>>
>>>> However, jailhouse_ioremap_col should be enhanced to a generic
>>>> jailhouse_ioremap that maps also uncolored memory the same way. To my
>>>> understanding, next_colored already returns the next physical page in
>>>> the uncolored case.
>>>>
>>>>>>
>>>>>> This non-clean separation, and the fact that the IPA/PA mapping should 
>>>>>> not
>>>>>> belong to the driver, but only to the HV are among the reasons why we 
>>>>>> moved to
>>>>>> the solution presented in the patch series.
>>>>>
>>>>> If code overall grows massively, it's valid to consolidate in the HV. If
>>>>> it's just about having it somewhere, the driver is the better place.
>>>>>
>>>>
>>>> Would be interesting to see the diffstat of the hypervisor with your
>>>> driver coloring extension. The cache layout detection leaves to
>>>> hypervisor, and no new hypercall is added. Should be worth it.
>>>
>>> What do you mean here? In the version posted in the patches (the most 
>>> updated
>>> version) there is no hypercall. Do you have in mind a cross-over between 
>>> the two
>>> versions?
>>>
>>> What is the objective you would like to achieve from a design point of view?
>>>
>>
>> Right, there is no special call anymore. I meant the special handling of
>> colored regions for set-loadable, though. That can be avoided.
> 
> The cost of avoiding it, is to make Linux running in the root-cell aware that 
> a
> mapping is colored, and have something similar to the loops in 
> "color_cell_op()"
> also in the ioremap_colored() in Linux.
> 
> (In the wip/design/coloring-driver version, this loop complexity is hidden in
> next_colored() + next_color(), but the basic complexity is the same.)
> 
> To me, coloring should just effect the IPA - PA translation, and except for 
> the
> configuration dependent offset, the coloring can be kept transparent for 
> Linux.

Let's argue over diffstats. We already have one for the transparent
option, and you have the driver bits for opaque option. Please try to
plug in the hypervisor, and then we can see. My understanding is that
this should be predominately about removing logic from the hypervisor,
thus not to lengthy to implement.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/f7777383-0a65-16c3-25d2-fbf38789f9de%40siemens.com.

Reply via email to