Somewhat, but in the regular case where enter() is called followed by
leave() there is really no complexity for the caller, just an extra
local variable.

There are complications where we want to exit lazy_mmu temporarily, as
in mm/kasan/shadow.c [1k], but this is in fact unavoidable. Chatting
with Mark Rutland, I realised that to truly support nested sections,
this must be handled in a special way in any case. To be clear, I am
referring to this situation:

__kasan_populate_vmalloc:
     apply_to_page_range:
         arch_enter_lazy_mmu_mode() {1}

         kasan_populate_vmalloc_pte:
             arch_leave_lazy_mmu_mode() {2}
             arch_enter_lazy_mmu_mode() {3}

         arch_leave_lazy_mmu_mode() {4}

With the approach this series takes, call {2} is made safe by passing a
special parameter (say LAZY_MMU_FLUSH) that forces lazy_mmu to be fully
exited - and call {3} will then re-enter lazy_mmu. This works regardless
of whether __kasan_populate_vmalloc() has been called with lazy_mmu
already enabled (i.e. calls {1} and {4} can be nested).

On the other hand, with a pagefault_disabled-like approach, there is no
way to instruct call {3} to fully exit lazy_mmu regardless of the
nesting level.

Sure there is, with a better API. See below. :)


It would be possible to make both approaches work by introducing a new
API, along the lines of:
- int arch_disable_save_lazy_mmu_mode() (the return value indicates the
nesting level)
- void arch_restore_lazy_mmu_mode(int state) (re-enter lazy_mmu at the
given nesting level)

Yes, I think we really need a proper API.


This is arguably more self-documenting than passing LAZY_MMU_FLUSH in
call {2}. This API is however no simpler when using a
pagefault_disabled-like approach (and less consistent than when always
saving state on the stack).

Yes, a proper API is warranted. In particular, thinking about the following:

arch_enter_lazy_mmu_mode() {1}
        arch_enter_lazy_mmu_mode() {2}

        kasan_populate_vmalloc_pte:
                arch_leave_lazy_mmu_mode() {3}
                arch_enter_lazy_mmu_mode() {4}

        arch_leave_lazy_mmu_mode() {5}
arch_leave_lazy_mmu_mode() {6}


Imagine if we have the following API instead:

lazy_mmu_enable() {1}
        lazy_mmu_enable() {2}

        kasan_populate_vmalloc_pte:
                lazy_mmu_pause() {3}
                lazy_mmu_continue() {4}

        lazy_mmu_disable() {5}
lazy_mmu_disable() {6}


I think it is crucial that after lazy_mmu_save/lazy_mmu_restore, no more nesting must happen.

Assume we store in the task_struct

uint8_t lazy_mmu_enabled_count;
bool lazy_mmu_paused;

We can do things like

a) Sanity check that while we are paused that we get no more enable/disable requests
b) Sanity check that while we are paused that we get no more pause requests.

[...]


If LAZY_MMU_DEFAULT etc. are not for common code, then please
maintain them for the individual archs as well, just like you do with the
opaque type.

I see your point - having them defined in <linux/mm_types.h> could be
misleading. I just wanted to avoid all 4 architectures defining the same
macros. Maybe call them __LAZY_MMU_* to suggest they're not supposed to
be used in generic code?

Maybe look into avoiding them completely :) Let's agree on the API first and then figure out how to pass the information we need to pass.

[...]

Worse, it does not
truly enable states to be nested: it allows the outermost section to
store some state, but nested sections cannot allocate extra space. This
is really what the stack is for.

If it's really just 8 bytes I don't really see the problem. So likely
there is
more to it?

I suppose 8 extra bytes per task is acceptable, but some architectures
may want to add more state there.

Just for reference: we currently perform an order-2 allocation, effectively leaving ~4KiB "unused".

If there are any real such case on the horizon where we need to store significantly more (in which case storing it on the stack might probably also bad), please let me know.


The one case that is truly problematic (though not required at this
point) is where each (nested) section needs to store its own state. With
this series it works just fine as there is a lazy_mmu_state_t for each
section, however if we use task_struct/thread_struct there can be only
one member shared by all nested sections.

Do we have a use case for that on the horizon? If so, I fully agree, we have to store information per level. How/what information we have to store would be another question.

--
Cheers

David / dhildenb


Reply via email to