On 10/31/2011 10:12 PM, Scott Wood wrote:
> >> +4.59 KVM_DIRTY_TLB
> >> +
> >> +Capability: KVM_CAP_SW_TLB
> >> +Architectures: ppc
> >> +Type: vcpu ioctl
> >> +Parameters: struct kvm_dirty_tlb (in)
> >> +Returns: 0 on success, -1 on error
> >> +
> >> +struct kvm_dirty_tlb {
> >> + __u64 bitmap;
> >> + __u32 num_dirty;
> >> +};
> >
> > This is not 32/64 bit safe. e500 is 32-bit only, yes?
>
> e5500 is 64-bit -- we don't support it with KVM yet, but it's planned.
>
> > but what if someone wants to emulate an e500 on a ppc64? maybe it's better
> > to add
> > padding here.
>
> What is unsafe about it? Are you picturing TLBs with more than 4
> billion entries?
sizeof(struct kvm_tlb_dirty) == 12 for 32-bit userspace, but == 16 for
64-bit userspace and the kernel. ABI structures must have the same
alignment and size for 32/64 bit userspace, or they need compat handling.
> There shouldn't be any alignment issues.
>
> > Another alternative is to drop the num_dirty field (and let the kernel
> > compute it instead, shouldn't take long?), and have the third argument
> > to ioctl() reference the bitmap directly.
>
> The idea was to make it possible for the kernel to apply a threshold
> above which it would be better to ignore the bitmap entirely and flush
> everything:
>
> http://www.spinics.net/lists/kvm/msg50079.html
>
> Currently we always just flush everything, and QEMU always says
> everything is dirty when it makes a change, but the API is there if needed.
Right, but you don't need num_dirty for it. There are typically only a
few dozen entries, yes? It should take a trivial amount of time to
calculate its weight.
> >> +Configures the virtual CPU's TLB array, establishing a shared memory area
> >> +between userspace and KVM. The "params" and "array" fields are userspace
> >> +addresses of mmu-type-specific data structures. The "array_len" field is
> >> an
> >> +safety mechanism, and should be set to the size in bytes of the memory
> >> that
> >> +userspace has reserved for the array. It must be at least the size
> >> dictated
> >> +by "mmu_type" and "params".
> >> +
> >> +While KVM_RUN is active, the shared region is under control of KVM. Its
> >> +contents are undefined, and any modification by userspace results in
> >> +boundedly undefined behavior.
> >> +
> >> +On return from KVM_RUN, the shared region will reflect the current state
> >> of
> >> +the guest's TLB. If userspace makes any changes, it must call
> >> KVM_DIRTY_TLB
> >> +to tell KVM which entries have been changed, prior to calling KVM_RUN
> >> again
> >> +on this vcpu.
> >
> > We already have another mechanism for such shared memory,
> > mmap(vcpu_fd). x86 uses it for the coalesced mmio region as well as the
> > traditional kvm_run area. Please consider using it.
>
> What does it buy us, other than needing a separate codepath in QEMU to
> allocate the memory differently based on whether KVM (and this feature)
The ability to use get_free_pages() and ordinary kernel memory directly,
instead of indirection through a struct page ** array.
> are being used, since QEMU uses this for its own MMU representation?
>
> This API has been discussed extensively, and the code using it is
> already in mainline QEMU. This aspect of it hasn't changed since the
> discussion back in February:
>
> http://www.spinics.net/lists/kvm/msg50102.html
>
> I'd prefer to avoid another round of major overhaul without a really
> good reason.
Me too, but I also prefer not to make ABI choices by inertia. ABI is
practically the only thing I care about wrt non-x86 (other than
whitespace, of course). Please involve me in the discussions earlier in
the future.
> >> +For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
> >> + - The "params" field is of type "struct kvm_book3e_206_tlb_params".
> >> + - The "array" field points to an array of type "struct
> >> + kvm_book3e_206_tlb_entry".
> >> + - The array consists of all entries in the first TLB, followed by all
> >> + entries in the second TLB.
> >> + - Within a TLB, entries are ordered first by increasing set number.
> >> Within a
> >> + set, entries are ordered by way (increasing ESEL).
> >> + - The hash for determining set number in TLB0 is: (MAS2 >> 12) &
> >> (num_sets - 1)
> >> + where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[]
> >> value.
> >> + - The tsize field of mas1 shall be set to 4K on TLB0, even though the
> >> + hardware ignores this value for TLB0.
> >
> > Holy shit.
>
> You were the one that first suggested we use shared data:
> http://www.spinics.net/lists/kvm/msg49802.html
>
> These are the assumptions needed to make such an interface well-defined.
Just remarking on the complexity, don't take it personally.
> >> @@ -95,6 +90,9 @@ struct kvmppc_vcpu_e500 {
> >> u32 tlb1cfg;
> >> u64 mcar;
> >>
> >> + struct page **shared_tlb_pages;
> >> + int num_shared_tlb_pages;
> >> +
> >
> > I missed the requirement that things be page aligned.
>
> They don't need to be, we'll ignore the data before and after the shared
> area.
>
> > If you use mmap(vcpu_fd) this becomes simpler; you can use
> > get_free_pages() and have a single pointer. You can also use vmap() on
> > this array (but get_free_pages() is faster).
>
> We do use vmap(). This is just the bookkeeping so we know what pages to
> free later.
>
Ah, I missed that (and the pointer).
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html