On 03/03/2016 08:03 AM, Marc Zyngier wrote:
> On 03/03/16 13:25, Shanker Donthineni wrote:
>>
>> On 03/02/2016 11:35 AM, Marc Zyngier wrote:
>>> On 02/03/16 15:48, Shanker Donthineni wrote:
>>>
>>>> We haven't started running heavy workloads in VMs. So far we
>>>> have noticed this random nature behavior only during guest
>>>> kernel boot (at EL1).  
>>>>
>>>> We didn't see this problem on 4.3 kernel. Do you think it is
>>>> related to TLB conflicts?
>>> I cannot imagine why a DSB would solve a TLB conflict. But the fact that
>>> you didn't see it crashing on 4.3 is a good indication that something
>>> else it at play.
>>>
>>> In 4.5, we've rewritten a large part of KVM in C, which has changed the
>>> ordering of the various accesses a lot. It could be that a latent
>>> problem is now exposed more widely.
>>>
>>> Can you try moving this DSB around and find out what is the earliest
>>> point where it solves this problem? Some sort of bisection?
>> The maximum I can move up 'dsb ishst' to the beginning of
>> __guest_enter() but not out side of this function.
>>
>> I don't understand why it is failing below code, branch
>> instruction causing problems.
>>
>>     /* Jump in the fire! */
>> +  dsb(ishst);
>>     exit_code = __guest_enter(vcpu, host_ctxt);
>>     /* And we're baaack! */
> That's very worrying. I can't see how the branch can have an influence
> on the the DSB (nor why the DSB has an influence on the rest of the
> execution, btw).
>
> What if you replace the DSB with an ISB? Do you observe a similar
> behaviour (works if the barrier is in __guest_enter, but not if it is
> outside)?
I have already tried with isb without success. I did another
experiment flush stage-2 TLBs before calling __guest_enetr(),
it fixed the problem.

> Another thing worth looking at is what happened just before we decided
> to get back into the guest. Or to put it differently, what was the
> reason to exit the first place. Was it a Stage-2 fault by any chance?

I will collect as much possible debug data and update results
to you. I went through your KVM refracted 'C' code and did not
find any thing suspicious. I am thinking may be Qualcomm CPUs
have a very aggressive prefech logic that causing the problem. 

> Thanks,
>
>       M.

-- 
Shanker Donthineni
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reply via email to