Hi Marc,
On 9/2/20 12:10 PM, Marc Zyngier wrote:
> On 2020-09-02 11:59, Alexandru Elisei wrote:
>> Hi,
>>
>> On 8/22/20 3:44 AM, Gavin Shan wrote:
>>> Depending on the kernel configuration, PUD_SIZE could be equal to
>>> PMD_SIZE. For example, both of them are 512MB with the following
>>> kernel configuration. In this case, both PUD and PMD are folded
>>> to PGD.
>>>
>>> CONFIG_ARM64_64K_PAGES y
>>> CONFIG_ARM64_VA_BITS 42
>>> CONFIG_PGTABLE_LEVELS 2
>>>
>>> With the above configuration, the stage2 PUD is used to backup the
>>> 512MB huge page when the stage2 mapping is built. During the mapping,
>>> the PUD and its subordinate levels of page table entries are unmapped
>>> if the PUD is present and not huge page sensitive in stage2_set_pud_huge().
>>> Unfornately, the @addr isn't aligned to S2_PUD_SIZE and wrong page table
>>> entries are zapped. It eventually leads to PUD's present bit can't be
>>> cleared successfully and infinite loop in stage2_set_pud_huge().
>>>
>>> This fixes the issue by checking with S2_{PUD, PMD}_SIZE instead of
>>> {PUD, PMD}_SIZE to determine if stage2 PUD or PMD is used to back the
>>> huge page. For this particular case, the stage2 PMD entry should be
>>> used to backup the 512MB huge page with stage2_set_pmd_huge().
>>
>> I can reproduce this on my rockpro64 using kvmtool.
>>
>> I see two issues here: first, PUD_SIZE = 512MB, but S2_PUD_SIZE = 4TB
>> (checked
>> using printk), and second, stage2_set_pud_huge() hangs. I'm working on
>> debugging them.
>
> I have this as an immediate fix for the set_pud_huge hang, tested
> on Seattle with 64k/42bits.
>
> I can't wait to see the back of this code...
The problem is in stage2_set_pud_huge(), because kvm_stage2_has_pmd() returns
false (CONFIG_PGTABLE_LEVELS = 2):
pudp = stage2_get_pud(mmu, cache, addr);
VM_BUG_ON(!pudp);
old_pud = *pudp;
[..]
// Returns 1 because !kvm_stage2_has_pmd()
if (stage2_pud_present(kvm, old_pud)) {
/*
* If we already have table level mapping for this block, unmap
* the range for this block and retry.
*/
if (!stage2_pud_huge(kvm, old_pud)) { // Always true because
!kvm_stage2_has_pmd()
unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
goto retry;
}
And we end up jumping back to retry forever. IMO, in user_mem_abort(), if
PUD_SIZE
== PMD_SIZE, we should try to map PMD_SIZE instead of PUD_SIZE. Maybe something
like this?
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..178267dec511 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1886,8 +1886,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
* As for PUD huge maps, we must make sure that we have at least
* 3 levels, i.e, PMD is not folded.
*/
- if (vma_pagesize == PMD_SIZE ||
- (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
+ if (vma_pagesize == PUD_SIZE && !kvm_stage2_has_pmd(kvm))
+ vma_pagesize = PMD_SIZE;
+
+ if (vma_pagesize == PUD_SIZE || vma_pagesize == PUD_SIZE)
gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >>
PAGE_SHIFT;
mmap_read_unlock(current->mm);
Thanks,
Alex
_______________________________________________
kvmarm mailing list
[email protected]
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm